Skip to content

POPSIMSPG

Alex Bettinardi edited this page Jan 9, 2025 · 1 revision

Introduction

This wiki describes the implementation of PopulationSim in the Oregon Statewide Transportation/Land Use Integrated Model (SWIM-TLUMIP) and instructions for running the module within the SWIM-TLUMIP modeling system.

The previous implementation of the Synthetic Population Generator (SPG) modules, SPG1 and SPG2, have been replaced with PopulationSim (as of 2020), and are named POPSIMSPG1 and POPSIMSPG2 respectively. POPSIMSPG1 generates a synthetic population with controls specified only at the regional level. This produces a frequency distribution of households by household size, which is necessary for the Activity Allocation (AA) component of SWIM. POPSIMSPG2 runs with controls at both the alpha zone level and the regional level to generate a detailed synthetic population for use by the Person Transport (PT) and Transit Assignment (TR). The default setup for SWIM only runs POPSIMSPG2 in transport years (years in which PT and TS are run).

PopulationSim follows the steps below to create synthetic households and persons.

  • Initial Seed Balancing

  • Meta Control Factoring (step is parallelized for POPSIMSPG2)

  • Final Seed Balancing

  • Integerize Final Seed Weights

  • Expand Households

Details on PopulationSim software can be found here https://rsginc.github.io/populationsim/index.html. Further details on the previous Population Synthesizer used in SWIM (SPG) can be reviewed on the retired SPG wiki page. Although it should be noted that the SPG wiki page was never fully populated or described in a meaningful way.

Implementation

PopulationSim is run automatically from the Model Orchestrator. It calls a batch file that runs PopulationSim for SPG1 or SPG2 depending on the argument sent to the Python code. For more information on how to run PopulationSim manually, see below.

Each of the POPSIMSPG modules call multiple python functions to implement PopulationSim. The python functions for the two modules are as follows –

runSPG1:

  • createDirectories : This module creates the necessary directories to store PopulationSim inputs and outputs for each scenario year. PopulationSimBase directory is required for the module to run and is included in the latest repository. This directory contains the settings files for SPG1 and SPG2 which are copied over to each scenario year’s directory by the module itself. This is needed as controls for each year change but the settings remain the same over time. Hence the static files are stored in the PopulationSimBase directory which are then copied to the scenario year directory for every scenario year. A PopulationSim directory gets created inside the outputs directory of each scenario year. The PopulationSim directory also contains sub-directories including config, data and output. These directories contain the standard inputs and outputs for PopulationSim. These outputs are then post-processed to create SPG format outputs for SWIM.

  • createSeed : This module creates the seed data for running PopulationSim. The seed data is created using the PUMS (Public Use Microdata Sample) household and person data for the states of Oregon, California and Washington. PUMAs within the SWIM halo are filtered from the complete data set to obtain seed data for the SWIM PopulationSim module.

  • copySeeds : This module copies the seeds created for the first scenario year to subsequent years. The seeds need not be created in every iteration as they remain static over time.

  • spg1Controls : This module creates the control data for POPSIMSPG1. This module utilizes multiple inputs including Jobs to Workers Factor, Workers per Household marginals, household size distribution, NED population and employment forecast and the household seed file. These inputs are a mix of model inputs, and outputs of previous modules (NED). The locations of input and output files of each module are listed in the tables below.

  • run_spg1 : This module executes the PopulationSim program with the seed and control created above.

  • spg1PostProcess : This module processes the standard PopulationSim outputs and generates SPG format outputs for use by downstream SWIM modules. For example, this process converts synthetic_households.csv to HouseholdsByHHCategory.csv which is used by AA and SPG2.

runSPG2:

  • createDirectories : This module creates the necessary directories to store PopulationSim inputs and outputs for each scenario year. A PopulationSim directory gets created inside the outputs directory of each scenario year. The PopulationSim directory also contains sub-directories including config, data and output. These directories contain the standard inputs and outputs for PopulationSim. These outputs are then post-processed to create SPG format outputs for SWIM.
  • copySeeds : This module copies the seeds created for the first scenario year to subsequent years.
  • spg2Controls : This module creates the control data for POPSIMSPG2. This module utilizes multiple inputs including SPG1 outputs such as synthetic_households.csv, synthetic_persons.csv and HouseholdsByHHCategory.csv. It also uses laborDollarProduction.csv and ActivityLocations2.csv which are outputs of the AA module. SPG2 derives regional controls from SPG1 regional controls file and utilizes the puma_beta_alpha_xwalk.csv file to combine data from different geographies. The locations of input and output files of each module are listed in the tables below.
  • spg2Settings : This module updates the values of two settings in the settings.yaml file in the configs folder.
    • num_processes: This setting defines the number of processors to use when running the module in parallel mode. The value for this setting is updated based on spg2.num.processors defined in globalTemplate.properties file.
    • MAX_BALANCE_ITERATIONS_SIMULTANEOUS: This setting defines the maximum number of iterations used by the sub_balancer component of the module to reach the optimal solution. The value for this settings is updated based on spg2.max.iterations defined in globalTemplate.properties file.
  • run_spg2 : This module executes the PopulationSim program with the seed and control created above.
  • spg2PostProcess : This module processes the standard PopulationSim outputs and generates SPG format outputs for use by downstream SWIM modules. The standard PopulationSim output includes synthetic_households.csv and synthetic_persons.csv, both of which have a number of fields indicated by the user. The SPG format of these outputs are SynPopH.csv and SynPopP.csv, which have different fields and field names, hence they are created during the post-processing step. The post-processing step also creates the SynPopTAZSummary.csv file which summarizes the synthetic population by TAZ.

The above text somewhat hides these two important properties in globalTemplate.properites that should be reviewed and updated each time SWIM is moved onto a new machine. These features were added as PopulationSim was threaded to speed up processing:

## SPG2 related properties
spg2.num.processors = 2
spg2.max.iterations = 1000

The following tables describe PopulationSim inputs and outputs. Each table lists the input or output file, describes the file, indicate the location of the file, and lists the SWIM modules that are responsible for its creation (inputs) or consumption (outputs). A few key files are described in more detail. For more information on the data used to set up PopulationSim and how the software works, see the following sections.

Inputs:

Input File Description Address Modules
Properties File Module-specific properties from current model run. All functions in the PopulationSim script use these files to retrieve file input/output file locations and other model properties. /outputs/tyear/popsimspg1.properties
/outputs/tyear/popsimspg2.properties
createDirectories
createSeed
spg1Controls
run_spg1
spg1PostProcess
spg2Controls
spg2Settings
spg2PostProcess
PUMS Household (h) and Person (p) data for Oregon (41), California (06) and Washington (53). Input household and person data by PUMA downloaded from the American Community Survey website. root/census/psam_h06.csv
root/census/psam_h41.csv
root/census/psam_h53.csv
root/census/psam_p06.csv
root/census/psam_p41.csv
root/census/psam_p53.csv
createSeed
PUMA to Alpha-Beta xwalk Xwalk file created by SWIM_VISUM_Main.py outputs/t20/puma_beta_alpha_xwalk.csv createSeed
spg2Controls
spg2PostProcess
pums_to_split_industry xwalk Correspondence between pums occupation and industry codes to split industries inputs/parameters/pums_to_split_industry.csv createSeed
ACS occupation categories Updated ACS occupation categories including missing codes inputs/parameters/acs_occupation_2005_2009_forPopSim.csv createSeed
Jobs to Workers Factor Jobs to workers factor file inputs/parameters/JobsToWorkersFactor.csv spg1Controls
Workers per HH Workers per household distribution file inputs/parameters/workersPerHouseholdMarginalxYEAR.csv spg1Controls
HH by HH Size Distribution Household Size distribution from the input PUMS data. inputs/parameters/hh_dist.csv spg1Controls
NED Employment Forecast of the total amounts of production by activity, used to update TechnologyOptionsW and ActivityTotalW (see [Working Files][Working Files]) /outputs/tyear/activity_forecast.csv spg1Controls
NED Population Population forecast from the NED module /outputs/tyear/population_forecast.csv spg1Controls
Household Seed Seed household file inputs/parameters/PopulationSimBase/data/seed_households.csv spg1Controls
Controls Configuration Control file for running populationsim, contains information about household and person level control variables for generating synthetic population outputs/tyear/PopulationSim/SPG1/configs/controls.csv
outputs/tyear/PopulationSim/SPG2/configs/controls.csv
run_spg1
run_spg2
PopulationSim Settings Software settings for PopulationSim outputs/tyear/PopulationSim/SPG1/configs/settings.yaml
outputs/ tyear/PopulationSim/SPG2/configs/settings.yaml
run_spg1
spg2Settings
run_spg2
PopulationSim Log PopulationSim log file outputs/ tyear/PopulationSim/SPG1/configs/logging.yaml
outputs/tyear/PopulationSim/SPG1/configs/logging.yaml
run_spg1
run_spg2
SPG1 Synthetic Households Synthetic Households file outputs/tyear/PopulationSim/SPG1/output/synthetic_households.csv spg1PostProcess
SPG1 Synthetic Households Synthetic Households file outputs/tyear/PopulationSim/SPG1/output/synthetic_households.csv spg2Controls
SPG1 Synthetic Persons Synthetic Persons file outputs/tyear/PopulationSim/SPG1/output/synthetic_persons.csv spg2Controls
HH by HH Category SPG1 modelwide summary of households by category /outputs/tyear/HouseholdsByHHCategory.csv spg2Controls
Activity Locations2 Activity quantities by TAZ (alpha zone) /outputs/tyear/ActivityLocations2.csv spg2Controls
Labor Dollar Production Total amount of each type of labor produced by each household category in each zone /outputs/tyear/laborDollarProduction.csv spg2Controls
SPG1 Region Control Region Level control for SPG1 outputs/tyear/PopulationSim/SPG1/data/control_region.csv spg2Controls
SGP2 Synthetic Households Synthetic Households file outputs/tyear/PopulationSim/SPG2/output/synthetic_households.csv spg2PostProcess
SPG2 Synthetic Persons Synthetic Persons file outputs/tyear/PopulationSim/SPG2/output/synthetic_persons.csv spg2PostProcess

A brief description of some important input files follows.

ACS Occupation Categories File:

acs_occupation_2005_2009_forPopSim.csv lists all occupations from Census and maps them to a PECAS occupation label. The file is used to lookup the occupation field from PUMS seed data, and assign it to a PECAS occupation. The origin of the file is from the initial SPG implementation. However, the original file (acs_occupation_2005_2009.csv) was missing many occupation categories that exist in the current ACS PUMS data, so it was modified to include occupation categories that were missing by finding the closest occupation code in the original file, and the PECAS occupation label for that code was assigned to the missing occupation.

JobsToWorkersFactor File:

The NED model produces both jobs (employment) and workers (persons), reflecting the mismatch between US Bureau of Economic Analysis job data and Census worker data. This file contains factors to convert jobs to workers by SPG sector. NED forecasted jobs are multiplied by these factors to calculate workers by sector, to account for mismatches between industry categories between the two datasets, as well as workers working multiple jobs (see workers by split industry, below).

WorkersPerHouseholdMarginalsxYEAR File:

This file is created using census data. The current file has workers per household records from 1990 to 2000. Households are grouped by number of worker categories such as 0, 1, 2, 3, 4 and 5.4 (average number of workers in 5+ worker households).

Outputs:

Output File Description Address Modules
Seed Households Seed household file outputs/tyear/PopulationSim/SPG/data/seed_households.csv createSeed
Seed Persons Seed persons file outputs/tyear/PopulationSim/SPG/data/seed_persons.csv createSeed
HH by HH Size Distribution SPG1 modelwide summary of households by category inputs/parameters/hh_dist.csv createSeed
SPG1 Subseed Control Controls at the subseed (sub-PUMA) geography. PopulationSim requires at least one geography smaller than the seed geography to run. This geography is fake and has a value of 1 for all entries. outputs/tyear/PopulationSim/SPG1/data/control_subseed.csv spg1Controls
SPG1 Region Control Region Level control for SPG1 outputs/tyear/PopulationSim/SPG1/data/control_region.csv spg1Controls
SPG1 Geo xwalk xwalk between various geographies in seed and control data outputs/tyear/PopulationSim/SPG1/data/geo_cross_walk.csv spg1Controls
SGP1 Synthetic Households Synthetic Households file outputs/tyear/PopulationSim/SPG1/output/synthetic_households.csv run_spg1
SPG1 Synthetic Persons Synthetic Persons file outputs/tyear/PopulationSim/SPG1/output/synthetic_persons.csv run_spg1
HH by HH Category SPG1 modelwide summary of households by category /outputs/tyear/HouseholdsByHHCategory.csv spg1PostProcess
SPG2 Alpha Zone Control Alpha zone level controls for SPG2 outputs/tyear/PopulationSim/SPG2/data/control_alpha.csv spg2Controls
SPG2 Region Control Region level controls for SPG2 outputs/tyear/PopulationSim/SPG2/data/control_region.csv spg2Controls
SPG2 Geo xwalk xwalk between various geographies in seed and control data outputs/tyear/PopulationSim/SPG2/data/geo_cross_walk.csv spg2Controls
SPG2 settings Setting file for SPG2 module outputs/tyear/PopulationSim/SPG2/configs/settings.yaml spg2Settings
SGP2 Synthetic Households Synthetic Households file outputs/tyear/PopulationSim/SPG2/output/synthetic_households.csv run_spg2
SPG2 Synthetic Persons Synthetic Persons file outputs/tyear/PopulationSim/SPG2/output/synthetic_persons.csv run_spg2
SGP2 SynPopH SWIM format synthetic households outputs/tyear/SynPopH.csv spg2PostProcess
SPG2 SynPopP SWIM format synthetic persons outputs/tyear/SynPopP.csv spg2PostProcess
SynPop TAZ Summary SWIM format TAZ summary of synthetic population outputs/tyear/SynPop_Taz_Summary.csv spg2PostProcess

Output Files:

This section contains the output files and defines the fields within them. Note that the PopulationSim software creates households.csv and persons.csv. These files are then re-formatted for input to PT and named SynPopH.csv and SynPopP.csv.

filename: spg2_synthetic_households.csv

Field Description Values
NP Number of persons in household 0 to max number of persons
BLD Number of units in structure 01 .Mobile home or trailer
02 .One-family house detached
03 .One-family house attached
04 .2 Apartments
05 .3-4 Apartments
06 .5-9 Apartments
07 .10-19 Apartments
08 .20-49 Apartments
09 .50 or more apartments
10 .Boat, RV, van, etc.
NWESR Number of workers 0 .N/A (less than 16 years old)
1 .Civilian employed, at work
2 .Civilian employed, with a job but not at work
3 .Unemployed
4 .Armed forces, at work
5 .Armed forces, with a job but not at work
6 .Not in labor force
VEH Number of vehicles 0 .No vehicles
1 .1 vehicle
2 .2 vehicles
3 .3 vehicles
4 .4 vehicles
5 .5 vehicles
6 .6 or more vehicles
hh_id Household ID 1 to max ID
HHINC2009 Household income in dollars ($2009) -999 to max income
AZONE lpha zone 1 to max zone number

filename: spg2_synthetic_persons.csv

Field Description Values
per_num Person ID
AGEP Age 00 .Under 1 year
01..99 .1 to 99 years (Top-coded***)
SEX Gender 1 .Male
2 .Female
Hh_id Household ID
ESR Work status 0 .N/A (less than 16 years old)
1 .Civilian employed, at work
2 .Civilian employed, with a job but not at work
3 .Unemployed
4 .Armed forces, at work
5 .Armed forces, with a job but not at work
6 .Not in labor force
SCH School enrollment b .N/A (less than 3 years old)
1 .No, has not attended in the last 3 months
2 .Yes, public school or public college
3 .Yes, private school or college or home school
INDP Census Industry code Industry recode for 2013 and later based on 2012 IND codes, see https://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/PUMSDataDict13.txt
OCCP Census Occupation code Occupation recode for 2012 and later based on 2010 OCC codes, see https://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/PUMSDataDict13.txt
Occupation PECAS Occupation ID Occupation categories consistent with AA
occupationLabel PECAS Occupation label Labels for occupation categories
split_industry_id PECAS Split industry ID Split industry codes consistent with AA
split_industry PECAS Split industry label Labels for split industry codes

filename: SynPopH.csv

Field Description Values
HHID Household ID
Persons Number of persons in household 0 to max persons
UNITS1 Number of units in structure 01 .Mobile home or trailer
02 .One-family house detached
03 .One-family house attached
04 .2 Apartments
05 .3-4 Apartments
06 .5-9 Apartments
07 .10-19 Apartments
08 .20-49 Apartments
09 .50 or more apartments
10 .Boat, RV, van, etc.
AUTOS Number of autos owned 0 .No vehicles
1 .1 vehicle
2 .2 vehicles
3 .3 vehicles
4 .4 vehicles
5 .5 vehicles
6 .6 or more vehicles
RHHINC Household income in dollars ($2009) -999 to max income
AZONE Alpha zone 1 to max zone number

filename: SynPopP.csv

Field Description Values
HH_ID Household ID Household ID
PERS_ID Person ID Person ID
AGE Age 00 .Under 1 year
01..99 .1 to 99 years (Top-coded***)
SEX Gender 1 .Male
2 .Female
RLABOR Worker status 0 .N/A (less than 16 years old)
1 .Civilian employed, at work
2 .Civilian employed, with a job but not at work
3 .Unemployed
4 .Armed forces, at work
5 .Armed forces, with a job but not at work
6 .Not in labor force
SCHOOL Student status 0 .N/A (less than 3 years old)
1 .No, has not attended in the last 3 months
2 .Yes, public school or public college
3 .Yes, private school or college or home school
INDUSTRY Industry code Industry recode for 2013 and later based on 2012 IND codes, see https://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/PUMSDataDict13.txt
OCCUP Occupation Occupation recode for 2012 and later based on 2010 OCC codes, see https://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/PUMSDataDict13.txt
SW_UNSPLIT_IND Unsplit Industry ID Split industry codes consistent with AA
SW_OCCUP Split Occupation ID Occupation index from ACS
SW_SPLIT_IND Split Industry ID Split industry codes consistent with AA

Note: SW_UNSPLIT_IND and SW_SPLIT_IND were exactly the same field in the older SPG implementation so they were kept exactly the same in the PopulationSim implementation.

Data Preparation And PopulationSim Controls

This section describes the data preparation performed for implementing PopulationSim in SWIM and the controls used for SPG1 and SPG2.

Seed Data

PopulationSim uses seed data consisting of households and persons to generate the synthetic population for a given region. We use ACS PUMS 2013-2017 data as the seed household and seed persons data. Data was downloaded for states of Oregon, Washington and California, concatenated together and filtered for the PUMAs that are within the SWIM region. The filtered data contains 31 Oregon PUMAs, 9 Washington PUMAs and 1 California PUMA. The filtered household and person data were processed further to create the missing control variables. A ‘Category’ field was created in the household seed file which contains the category of the household by size and income. In the person seed file, each worker was assigned a split_industry_id based on their Industry and Occupation IDs. The creation of these variables is described in detail in following paragraphs.

Household ‘Category’ Variable

SPG2 uses households by household size and income as a control variable at the alpha zone level, hence the seed household data must contain this category field. Household size variable is divided into 2 categories – 1to2 and 3plus and household income is categorized into 9 categories, with varying ranges of income. These two variables are then combined to create 18 categories of households by size and income. It should be noted that the HINCP field was multiplied with adjustment factors to convert all records (2013-2017) to 2017 dollars and later multiplied by another factor to convert 2017 dollars to 2009 dollars.

Table 1 HOUSEHOLD CATEGORY VARIABLE DEFINITIONS

VARIABLE HOUSEHOLD SIZE HOUSEHOLD INCOME ($)
HH0to8k1to2 1 to 2 0 to 8000
HH0to8k3plus 3 or more 0 to 8000
HH8to15k1to2 1 to 2 8000 to 15000
HH8to15k3plus 3 or more 8000 to 15000
HH15to23k1to2 1 to 2 15000 to 23000
HH15to23k3plus 3 or more 15000 to 23000
HH23to32k1to2 1 to 2 23000 to 32000
HH23to8k32plus 3 or more 23000 to 32000
HH32to46k1to2 1 to 2 32000 to 46000
HH32to46k3plus 3 or more 32000 to 46000
HH46to61k1to2 1 to 2 46000 to 61000
HH46to61k3plus 3 or more 46000 to 61000
HH61to76k1to2 1 to 2 61000 to 76000
HH61to76k3plus 3 or more 61000 to 76000
HH76to106k1to2 1 to 2 76000 to 106000
HH76to106k3plus 3 or more 76000 to 106000
HH106kUp1to2 1 to 2 More than 106000
HH106kUp3plus 3 or more More than 106000

‘split industry ID’ variable

In SWIM, the number of households generated in the region is driven by the demand for labor in any given simulation year. The model attempts to generate the right kinds of workers by controlling the synthetic population on a combination of workers industry and occupation. Furthermore, SWIM attempts to control the number and location of labor by also considering the type of land-use that a worker might work in. For example, workers in the construction industry could be tradesman, designers, or persons engaged in marketing, accounting, or several other occupations. Each has a different likelihood of working on a construction site, working in an office, etc. To address this issue, the New Economic Development (NED) module uses four definitions of construction (CNST), which is differentiated by main, nres (non-residential), offc (office), res (residential), and other. Office is further identified as only available to work in office land-use types. For example CNST_main_xxx versus CNST_offc_off.

SPG1 and SPG2 use ‘Workers by Split Industry’ as a region level control, which requires the person seed data to contain the split_industry_id field. Since the same industry and occupation id can be split into multiple split_industries, this step draws a random number for a worker, and assigns the split industry id according to an input distribution of split industries. The table below provides an example of how the assignment is done. The pums_to_split_industry.csv file provides proportions of different jobs within an industry and is denoted by the ‘proportion’ field in the table. The random number drawn for each worker is denoted by the ‘draw’ field. Cumulative proportion is calculated to create ranges of proportions needed for assignment and is denoted by ‘cumprop’ field. This field is shifted down one row to create the range of proportions and is denoted by the ‘prev_cumprop’ field. The worker is assigned to the split industry whose range contains the random draw and is identified in the ‘select’ field in the table.

Table 2 Split Industry Assignment To Workers

INDP OCCP split_ind_id split_industry proportion draw cumprop prev_cumprop select
770 6600 8 CNST_main_xxx 0.1176 0.0773 0.1176 0.0000 1
770 6600 9 CNST_nres_xxx 0.2941 0.0773 0.4118 0.1176 0
770 6600 10 CNST_othr_xxx 0.4705 0.0773 0.8824 0.4118 0
770 6600 11 CNST_res_xxx 0.1176 0.0773 1.0000 0.8824 0

The modified seed data contains these new fields, ‘Category’ in the household seed and ‘split_industry_id’ in the person seed.

Once the seed data is created, it is fixed for all model runs. The code currently checks to see if the SEED household data files exists. If it does not, it automatically creates the seed data from the input PUMS data. If it does exist, the code uses the input PUMS data to create the SEED data for the current run. This processing takes about 5-6 minutes to execute.

Controls

SPG1 Controls

SPG1 is run at the regional level, meaning synthetic population is generated for only 1 geography, i.e. the region. This raises issues in running PopulationSim which requires at least one sub-seed geography, i.e. a geography smaller than the seed geography (PUMA). This issue was averted by creating a set of fake sub-seed and seed geography ids to allow PopulationSim to run. Instead of using the PUMA field in the seed file to identify seed geography, a new field named ‘SEED’ was created and the value was set to 1 for all records in the seed data. This helps PopulationSim understand that there is only one geography in the entire region. In the geo_cross_walk.csv file, a fake cross walk was created between the subseed, seed and region level geographies and all of them were set to 1.

Controls for SPG1 were created at the sub-seed level, although in practice, these are region level controls. This was done because PopulationSim requires at least one sub-seed level control to run. Also, a region level control is required for which, total population of the region was used. In summary, the controls for SPG1 are as follows, along with a description of how these are calculated and a table listing all the controls for SPG1.

Total Households

The total number of households is obtained by dividing the total NED employment by average number of workers per household. The latter is calculated using the workers by household marginals distribution.

Target Geography Household or Person Control Importance Control Field
num_hh SUBSEED households 1000000000 HHS

Total Population

Total population is calculated by multiplying the total households calculated above by the average household size, which is obtained from the households by household size distribution created from census data.

Target Geography Household or Person Control Importance Control Field
TOTAL_POP REGION persons 100000 POPULATION

Households by household size

Households by household size obtained from census data is scaled up/down to match the total households calculated using NED employment to create household size controls.

Target Geography Household or Person Control Importance Control Field
hh_size_1 SUBSEED households 5000 HH_SIZE1
hh_size_2 SUBSEED households 5000 HH_SIZE2
hh_size_3 SUBSEED households 5000 HH_SIZE3
hh_size_4 SUBSEED households 5000 HH_SIZE4M

Households by number of workers

Households by number of workers is obtained from the workers per household marginals file and scaled up/down to match the total households.

Target Geography Household or Person Control Importance Control Field
hh_wrks_0 SUBSEED households 5000 HH_WRKR0
hh_wrks_1 SUBSEED households 5000 HH_WRKR1
hh_wrks_2 SUBSEED households 5000 HH_WRKR2
hh_wrks_3 SUBSEED households 5000 HH_WRKR3
hh_wrks_4 SUBSEED households 5000 HH_WRKR4
hh_wrks_5m SUBSEED households 5000 HH_WRKR5M

Population by age group

Population by age group is obtained from the population forecast file and scaled up/down to match the total population calculated above.

Target Geography Household or Person Control Importance Control Field
person_age0to4 SUBSEED persons 10000 P_AGE0
person_age5to9 SUBSEED persons 10000 P_AGE5
person_age10to14 SUBSEED persons 10000 P_AGE10
person_age15to19 SUBSEED persons 10000 P_AGE15
person_age20to24 SUBSEED persons 10000 P_AGE20
person_age25to29 SUBSEED persons 10000 P_AGE25
person_age30to34 SUBSEED persons 10000 P_AGE30
person_age35to39 SUBSEED persons 10000 P_AGE35
person_age40to44 SUBSEED persons 10000 P_AGE40
person_age45to49 SUBSEED persons 10000 P_AGE45
person_age50to54 SUBSEED persons 10000 P_AGE50
person_age55to59 SUBSEED persons 10000 P_AGE55
person_age60to64 SUBSEED persons 10000 P_AGE60
person_age65to69 SUBSEED persons 10000 P_AGE65
person_age70to74 SUBSEED persons 10000 P_AGE70
person_age75to79 SUBSEED persons 10000 P_AGE75
person_age80to84 SUBSEED persons 10000 P_AGE80
person_age85m SUBSEED persons 10000 P_AGE85

Workers by split industry

'Workers by split industry' is derived from 'employment by split industry' in the activity_forecast.csv file. 'Employment by split industry' in the activity_forecast.csv file is multiplied by the respective jobs_to_workers_factor (see jobs to workers file, above) to obtain the number of workers by split industry.

Target Geography Household or Person Control Importance Control Field
FIRE_fnin_off SUBSEED persons 100000 FIRE_fnin_off
INFO_info_off_li SUBSEED persons 100000 INFO_info_off_li
ENGY_elec_hi SUBSEED persons 100000 ENGY_elec_hi
GOV_admn_gov SUBSEED persons 100000 GOV_admn_gov
HOSP_acc_acc SUBSEED persons 100000 HOSP_acc_acc
HLTH_hosp_hosp SUBSEED persons 100000 HLTH_hosp_hosp
RES_agmin_ag SUBSEED persons 100000 RES_agmin_ag
ENGY_ptrl_hi SUBSEED persons 100000 ENGY_ptrl_hi
RET_stor_off SUBSEED persons 100000 RET_stor_off
SERV_stor_ret SUBSEED persons 100000 SERV_stor_ret
RES_forst_log SUBSEED persons 100000 RES_forst_log
RET_nstor_off SUBSEED persons 100000 RET_nstor_off
ENGY_offc_off SUBSEED persons 100000 ENGY_offc_off
RET_stor_ret SUBSEED persons 100000 RET_stor_ret
MFG_hvtw_li SUBSEED persons 100000 MFG_hvtw_li
GOV_offc_off SUBSEED persons 100000 GOV_offc_off
RET_auto_ret SUBSEED persons 100000 RET_auto_ret
MFG_htec_hi SUBSEED persons 100000 MFG_htec_hi
SERV_home_xxx SUBSEED persons 100000 SERV_home_xxx
MFG_hvtw_hi SUBSEED persons 100000 MFG_hvtw_hi
HLTH_othr_off_li SUBSEED persons 100000 HLTH_othr_off_li
ENGY_ngas_hi SUBSEED persons 100000 ENGY_ngas_hi
ENT_ent_ret SUBSEED persons 100000 ENT_ent_ret
MFG_lvtw_hi SUBSEED persons 100000 MFG_lvtw_hi
MFG_htec_li SUBSEED persons 100000 MFG_htec_li
UTL_othr_off SUBSEED persons 100000 UTL_othr_off
HIED_hied_off_inst SUBSEED persons 100000 HIED_hied_off_inst
CNST_offc_off SUBSEED persons 100000 CNST_offc_off
FIRE_real_off SUBSEED persons 100000 FIRE_real_off
HLTH_care_inst SUBSEED persons 100000 HLTH_care_inst
CNST_othr_xxx SUBSEED persons 100000 CNST_othr_xxx
SERV_nonp_off_inst SUBSEED persons 100000 SERV_nonp_off_inst
UTL_othr_off_li SUBSEED persons 100000 UTL_othr_off_li
TRNS_trns_ware SUBSEED persons 100000 TRNS_trns_ware
SERV_bus_off SUBSEED persons 100000 SERV_bus_off
MFG_offc_off SUBSEED persons 100000 MFG_offc_off
SERV_tech_off SUBSEED persons 100000 SERV_tech_off
CNST_res_xxx SUBSEED persons 100000 CNST_res_xxx
K12_k12_k12 SUBSEED persons 100000 K12_k12_k12
WHSL_offc_off SUBSEED persons 100000 WHSL_offc_off
CNST_main_xxx SUBSEED persons 100000 CNST_main_xxx
K12_k12_off SUBSEED persons 100000 K12_k12_off
RES_offc_off SUBSEED persons 100000 RES_offc_off
TRNS_trns_off SUBSEED persons 100000 TRNS_trns_off
HOSP_eat_ret_acc SUBSEED persons 100000 HOSP_eat_ret_acc
MFG_food_li SUBSEED persons 100000 MFG_food_li
INFO_info_off SUBSEED persons 100000 INFO_info_off
MFG_wdppr_hi SUBSEED persons 100000 MFG_wdppr_hi
SERV_site_li SUBSEED persons 100000 SERV_site_li
WHSL_whsl_ware SUBSEED persons 100000 WHSL_whsl_ware
MFG_food_hi SUBSEED persons 100000 MFG_food_hi
CNST_nres_xxx SUBSEED persons 100000 CNST_nres_xxx

SPG2 Controls

SPG2 is run with controls at both alpha zone level and the region level. The region level controls are the same as SPG1 and two new controls are added at the alpha zone level. The alpha zone level controls are households by size and income categories and workers by occupation. Households by size and income categories are obtained from an AA output, the ActivityLocations2.csv file. Workers by occupation at the alpha zone level are created using the total labor dollar production by alpha zone and household category, from the laborDollarProduction.csv file (also from AA). First, the data in the file are aggregated across all alpha zones to get region level labor dollars by occupation and household category. Second, region level workers by occupation and household category is created from SPG1 output. Region level labor dollars is divided by region level workers to get per-worker-labor-dollar by occupation and household category. This rate is then used to divide alpha zone level labor dollars to obtain alpha zone level workers by occupation and household category. The resulting worker count is then aggregated across all household categories to obtain alpha zone workers by occupation. In summary, SPG2 controls are as follows, along with a description of how these are calculated and a table listing all the controls for SPG2.

Households by Size and Income (Alpha)

Households by size and income categories are retrieved from the activity locations2 file which is an output of the Activity Allocation (AA) module. Since AA uses SPG1 outputs, the outputs of AA are ensures that controls are consistent across SPG1 and SPG2.

Target Geography Seed Table Importance Control Field
HH0to8k1to2 AZONE households 5000 HH0to8k1to2
HH0to8k3plus AZONE households 5000 HH0to8k3plus
HH106kUp1to2 AZONE households 5000 HH106kUp1to2
HH106kUp3plus AZONE households 5000 HH106kUp3plus
HH15to23k1to2 AZONE households 5000 HH15to23k1to2
HH15to23k3plus AZONE households 5000 HH15to23k3plus
HH23to32k1to2 AZONE households 5000 HH23to32k1to2
HH23to32k3plus AZONE households 5000 HH23to32k3plus
HH32to46k1to2 AZONE households 5000 HH32to46k1to2
HH32to46k3plus AZONE households 5000 HH32to46k3plus
HH46to61k1to2 AZONE households 5000 HH46to61k1to2
HH46to61k3plus AZONE households 5000 HH46to61k3plus
HH61to76k1to2 AZONE households 5000 HH61to76k1to2
HH61to76k3plus AZONE households 5000 HH61to76k3plus
HH76to106k1to2 AZONE households 5000 HH76to106k1to2
HH76to106k3plus AZONE households 5000 HH76to106k3plus
HH8to15k1to2 AZONE households 5000 HH8to15k1to2
HH8to15k3plus AZONE households 5000 HH8to15k3plus

Workers by Occupation (Alpha)

The total labor dollars by zone output from AA are read in (spg.labor.dollars.by.zone = /scenario/outputs/tYEAR/laborDollarProduction.csv). These are by occupation and have been mapped to PUMS workers by occupation.

a. These are summed across the region and divided by workers by occupation and household category (SPG1 output) to calculate labor dollars per worker.

b. Alpha zone total workers by occupation is calculated by dividing the alpha zone level total labor dollars by occupation and household category by the dollars per worker calculated in step a.

c. The household category is collapsed and workers by occupation and alpha zone is retained as a control.

Target Geography Seed Table Importance Control Field
A1_Mgmt_Bus AZONE persons 10000 A1-Mgmt Bus
B1_Prof_Specialty AZONE persons 10000 B1-Prof Specialty
B2_Education AZONE persons 10000 B2-Education
B3_Health AZONE persons 10000 B3-Health
B4_Technical_Unskilled AZONE persons 10000 B4-Technical Unskilled
C1_Sales_Clerical_Professionals AZONE persons 10000 C1-Sales Clerical Professionals
C2_Sales_Service AZONE persons 10000 C2-Sales Service
C3_Clerical AZONE persons 10000 C3-Clerical
C4_Sales_Clerical_Unskilled AZONE persons 10000 C4-Sales Clerical Unskilled
D1_Production_Specialists AZONE persons 10000 D1-Production Specialists
D2_MaintConstRepair_Specialists AZONE persons 10000 D2-MaintConstRepair Specialists
D3_ProtectTrans_Specialists AZONE persons 10000 D3-ProtectTrans Specialists
D4_Blue_Collar_Unskilled AZONE persons 10000 D4-Blue Collar Unskilled

How to Run Manually:

The primary file for running PopulationSim for SWIM is PopulationSimSPG.py. This python script contains multiple methods that run sequentially to implement PopulationSim. The arguments required to run this script are the properties files for the two POPSIMSPG modules (popsimspg1.properties or popsimspg2.properties) and the run mode for the respective modules (runSPG1 or runSPG2). A sample command to run the module independent of the SWIM model is as follows –

(path)/python PopulationSimSPG.py (path)/popsimspg1.properties runSPG1

or

(path)/python PopulationSimSPG.py (path)/popsimspg2.properties runSPG2

To manually run PopulationSim for SWIM do the following:

  • Edit the tsteps csv file to run POPSIMSPG1 or POPSIMSPG2 to create a SWIM properties file
  • Run build_run.bat to build the SWIM POPSIMSPG1 or POPSIMSPG2 properties file
  • Run the command above, for example:
E:/tlumip/_test/root/model/lib/swimpy/python.exe
E:/tlumip/_test/root/scenario/model/code/populationsimspg.py
E:/tlumip/_test/root/scenario/outputs/t20/popsimspg2.properties runSPG2
Clone this wiki locally