-
Notifications
You must be signed in to change notification settings - Fork 0
POPSIMSPG
This wiki describes the implementation of PopulationSim in the Oregon Statewide Transportation/Land Use Integrated Model (SWIM-TLUMIP) and instructions for running the module within the SWIM-TLUMIP modeling system.
The previous implementation of the Synthetic Population Generator (SPG) modules, SPG1 and SPG2, have been replaced with PopulationSim (as of 2020), and are named POPSIMSPG1 and POPSIMSPG2 respectively. POPSIMSPG1 generates a synthetic population with controls specified only at the regional level. This produces a frequency distribution of households by household size, which is necessary for the Activity Allocation (AA) component of SWIM. POPSIMSPG2 runs with controls at both the alpha zone level and the regional level to generate a detailed synthetic population for use by the Person Transport (PT) and Transit Assignment (TR). The default setup for SWIM only runs POPSIMSPG2 in transport years (years in which PT and TS are run).
PopulationSim follows the steps below to create synthetic households and persons.
-
Initial Seed Balancing
-
Meta Control Factoring (step is parallelized for POPSIMSPG2)
-
Final Seed Balancing
-
Integerize Final Seed Weights
-
Expand Households
Details on PopulationSim software can be found here https://rsginc.github.io/populationsim/index.html. Further details on the previous Population Synthesizer used in SWIM (SPG) can be reviewed on the retired SPG wiki page. Although it should be noted that the SPG wiki page was never fully populated or described in a meaningful way.
PopulationSim is run automatically from the Model Orchestrator. It calls a batch file that runs PopulationSim for SPG1 or SPG2 depending on the argument sent to the Python code. For more information on how to run PopulationSim manually, see below.
Each of the POPSIMSPG modules call multiple python functions to implement PopulationSim. The python functions for the two modules are as follows –
runSPG1:
-
createDirectories : This module creates the necessary directories to store PopulationSim inputs and outputs for each scenario year. PopulationSimBase directory is required for the module to run and is included in the latest repository. This directory contains the settings files for SPG1 and SPG2 which are copied over to each scenario year’s directory by the module itself. This is needed as controls for each year change but the settings remain the same over time. Hence the static files are stored in the PopulationSimBase directory which are then copied to the scenario year directory for every scenario year. A PopulationSim directory gets created inside the outputs directory of each scenario year. The PopulationSim directory also contains sub-directories including config, data and output. These directories contain the standard inputs and outputs for PopulationSim. These outputs are then post-processed to create SPG format outputs for SWIM.
-
createSeed : This module creates the seed data for running PopulationSim. The seed data is created using the PUMS (Public Use Microdata Sample) household and person data for the states of Oregon, California and Washington. PUMAs within the SWIM halo are filtered from the complete data set to obtain seed data for the SWIM PopulationSim module.
-
copySeeds : This module copies the seeds created for the first scenario year to subsequent years. The seeds need not be created in every iteration as they remain static over time.
-
spg1Controls : This module creates the control data for POPSIMSPG1. This module utilizes multiple inputs including Jobs to Workers Factor, Workers per Household marginals, household size distribution, NED population and employment forecast and the household seed file. These inputs are a mix of model inputs, and outputs of previous modules (NED). The locations of input and output files of each module are listed in the tables below.
-
run_spg1 : This module executes the PopulationSim program with the seed and control created above.
-
spg1PostProcess : This module processes the standard PopulationSim outputs and generates SPG format outputs for use by downstream SWIM modules. For example, this process converts synthetic_households.csv to HouseholdsByHHCategory.csv which is used by AA and SPG2.
runSPG2:
- createDirectories : This module creates the necessary directories to store PopulationSim inputs and outputs for each scenario year. A PopulationSim directory gets created inside the outputs directory of each scenario year. The PopulationSim directory also contains sub-directories including config, data and output. These directories contain the standard inputs and outputs for PopulationSim. These outputs are then post-processed to create SPG format outputs for SWIM.
- copySeeds : This module copies the seeds created for the first scenario year to subsequent years.
- spg2Controls : This module creates the control data for POPSIMSPG2. This module utilizes multiple inputs including SPG1 outputs such as synthetic_households.csv, synthetic_persons.csv and HouseholdsByHHCategory.csv. It also uses laborDollarProduction.csv and ActivityLocations2.csv which are outputs of the AA module. SPG2 derives regional controls from SPG1 regional controls file and utilizes the puma_beta_alpha_xwalk.csv file to combine data from different geographies. The locations of input and output files of each module are listed in the tables below.
-
spg2Settings : This module updates the values of two settings in the settings.yaml file in the configs folder.
- num_processes: This setting defines the number of processors to use when running the module in parallel mode. The value for this setting is updated based on spg2.num.processors defined in globalTemplate.properties file.
- MAX_BALANCE_ITERATIONS_SIMULTANEOUS: This setting defines the maximum number of iterations used by the sub_balancer component of the module to reach the optimal solution. The value for this settings is updated based on spg2.max.iterations defined in globalTemplate.properties file.
- run_spg2 : This module executes the PopulationSim program with the seed and control created above.
- spg2PostProcess : This module processes the standard PopulationSim outputs and generates SPG format outputs for use by downstream SWIM modules. The standard PopulationSim output includes synthetic_households.csv and synthetic_persons.csv, both of which have a number of fields indicated by the user. The SPG format of these outputs are SynPopH.csv and SynPopP.csv, which have different fields and field names, hence they are created during the post-processing step. The post-processing step also creates the SynPopTAZSummary.csv file which summarizes the synthetic population by TAZ.
The above text somewhat hides these two important properties in globalTemplate.properites that should be reviewed and updated each time SWIM is moved onto a new machine. These features were added as PopulationSim was threaded to speed up processing:
## SPG2 related properties
spg2.num.processors = 2
spg2.max.iterations = 1000
The following tables describe PopulationSim inputs and outputs. Each table lists the input or output file, describes the file, indicate the location of the file, and lists the SWIM modules that are responsible for its creation (inputs) or consumption (outputs). A few key files are described in more detail. For more information on the data used to set up PopulationSim and how the software works, see the following sections.
Input File | Description | Address | Modules |
---|---|---|---|
Properties File | Module-specific properties from current model run. All functions in the PopulationSim script use these files to retrieve file input/output file locations and other model properties. | /outputs/tyear/popsimspg1.properties /outputs/tyear/popsimspg2.properties |
createDirectories createSeed spg1Controls run_spg1 spg1PostProcess spg2Controls spg2Settings spg2PostProcess |
PUMS Household (h) and Person (p) data for Oregon (41), California (06) and Washington (53). | Input household and person data by PUMA downloaded from the American Community Survey website. | root/census/psam_h06.csv root/census/psam_h41.csv root/census/psam_h53.csv root/census/psam_p06.csv root/census/psam_p41.csv root/census/psam_p53.csv |
createSeed |
PUMA to Alpha-Beta xwalk | Xwalk file created by SWIM_VISUM_Main.py | outputs/t20/puma_beta_alpha_xwalk.csv | createSeed spg2Controls spg2PostProcess |
pums_to_split_industry xwalk | Correspondence between pums occupation and industry codes to split industries | inputs/parameters/pums_to_split_industry.csv | createSeed |
ACS occupation categories | Updated ACS occupation categories including missing codes | inputs/parameters/acs_occupation_2005_2009_forPopSim.csv | createSeed |
Jobs to Workers Factor | Jobs to workers factor file | inputs/parameters/JobsToWorkersFactor.csv | spg1Controls |
Workers per HH | Workers per household distribution file | inputs/parameters/workersPerHouseholdMarginalxYEAR.csv | spg1Controls |
HH by HH Size Distribution | Household Size distribution from the input PUMS data. | inputs/parameters/hh_dist.csv | spg1Controls |
NED Employment | Forecast of the total amounts of production by activity, used to update TechnologyOptionsW and ActivityTotalW (see [Working Files][Working Files]) | /outputs/tyear/activity_forecast.csv | spg1Controls |
NED Population | Population forecast from the NED module | /outputs/tyear/population_forecast.csv | spg1Controls |
Household Seed | Seed household file | inputs/parameters/PopulationSimBase/data/seed_households.csv | spg1Controls |
Controls Configuration | Control file for running populationsim, contains information about household and person level control variables for generating synthetic population | outputs/tyear/PopulationSim/SPG1/configs/controls.csv outputs/tyear/PopulationSim/SPG2/configs/controls.csv |
run_spg1 run_spg2 |
PopulationSim Settings | Software settings for PopulationSim | outputs/tyear/PopulationSim/SPG1/configs/settings.yaml outputs/ tyear/PopulationSim/SPG2/configs/settings.yaml |
run_spg1 spg2Settings run_spg2 |
PopulationSim Log | PopulationSim log file | outputs/ tyear/PopulationSim/SPG1/configs/logging.yaml outputs/tyear/PopulationSim/SPG1/configs/logging.yaml |
run_spg1 run_spg2 |
SPG1 Synthetic Households | Synthetic Households file | outputs/tyear/PopulationSim/SPG1/output/synthetic_households.csv | spg1PostProcess |
SPG1 Synthetic Households | Synthetic Households file | outputs/tyear/PopulationSim/SPG1/output/synthetic_households.csv | spg2Controls |
SPG1 Synthetic Persons | Synthetic Persons file | outputs/tyear/PopulationSim/SPG1/output/synthetic_persons.csv | spg2Controls |
HH by HH Category | SPG1 modelwide summary of households by category | /outputs/tyear/HouseholdsByHHCategory.csv | spg2Controls |
Activity Locations2 | Activity quantities by TAZ (alpha zone) | /outputs/tyear/ActivityLocations2.csv | spg2Controls |
Labor Dollar Production | Total amount of each type of labor produced by each household category in each zone | /outputs/tyear/laborDollarProduction.csv | spg2Controls |
SPG1 Region Control | Region Level control for SPG1 | outputs/tyear/PopulationSim/SPG1/data/control_region.csv | spg2Controls |
SGP2 Synthetic Households | Synthetic Households file | outputs/tyear/PopulationSim/SPG2/output/synthetic_households.csv | spg2PostProcess |
SPG2 Synthetic Persons | Synthetic Persons file | outputs/tyear/PopulationSim/SPG2/output/synthetic_persons.csv | spg2PostProcess |
A brief description of some important input files follows.
ACS Occupation Categories File:
acs_occupation_2005_2009_forPopSim.csv lists all occupations from Census and maps them to a PECAS occupation label. The file is used to lookup the occupation field from PUMS seed data, and assign it to a PECAS occupation. The origin of the file is from the initial SPG implementation. However, the original file (acs_occupation_2005_2009.csv) was missing many occupation categories that exist in the current ACS PUMS data, so it was modified to include occupation categories that were missing by finding the closest occupation code in the original file, and the PECAS occupation label for that code was assigned to the missing occupation.
JobsToWorkersFactor File:
The NED model produces both jobs (employment) and workers (persons), reflecting the mismatch between US Bureau of Economic Analysis job data and Census worker data. This file contains factors to convert jobs to workers by SPG sector. NED forecasted jobs are multiplied by these factors to calculate workers by sector, to account for mismatches between industry categories between the two datasets, as well as workers working multiple jobs (see workers by split industry, below).
WorkersPerHouseholdMarginalsxYEAR File:
This file is created using census data. The current file has workers per household records from 1990 to 2000. Households are grouped by number of worker categories such as 0, 1, 2, 3, 4 and 5.4 (average number of workers in 5+ worker households).
Output File | Description | Address | Modules |
---|---|---|---|
Seed Households | Seed household file | outputs/tyear/PopulationSim/SPG/data/seed_households.csv | createSeed |
Seed Persons | Seed persons file | outputs/tyear/PopulationSim/SPG/data/seed_persons.csv | createSeed |
HH by HH Size Distribution | SPG1 modelwide summary of households by category | inputs/parameters/hh_dist.csv | createSeed |
SPG1 Subseed Control | Controls at the subseed (sub-PUMA) geography. PopulationSim requires at least one geography smaller than the seed geography to run. This geography is fake and has a value of 1 for all entries. | outputs/tyear/PopulationSim/SPG1/data/control_subseed.csv | spg1Controls |
SPG1 Region Control | Region Level control for SPG1 | outputs/tyear/PopulationSim/SPG1/data/control_region.csv | spg1Controls |
SPG1 Geo xwalk | xwalk between various geographies in seed and control data | outputs/tyear/PopulationSim/SPG1/data/geo_cross_walk.csv | spg1Controls |
SGP1 Synthetic Households | Synthetic Households file | outputs/tyear/PopulationSim/SPG1/output/synthetic_households.csv | run_spg1 |
SPG1 Synthetic Persons | Synthetic Persons file | outputs/tyear/PopulationSim/SPG1/output/synthetic_persons.csv | run_spg1 |
HH by HH Category | SPG1 modelwide summary of households by category | /outputs/tyear/HouseholdsByHHCategory.csv | spg1PostProcess |
SPG2 Alpha Zone Control | Alpha zone level controls for SPG2 | outputs/tyear/PopulationSim/SPG2/data/control_alpha.csv | spg2Controls |
SPG2 Region Control | Region level controls for SPG2 | outputs/tyear/PopulationSim/SPG2/data/control_region.csv | spg2Controls |
SPG2 Geo xwalk | xwalk between various geographies in seed and control data | outputs/tyear/PopulationSim/SPG2/data/geo_cross_walk.csv | spg2Controls |
SPG2 settings | Setting file for SPG2 module | outputs/tyear/PopulationSim/SPG2/configs/settings.yaml | spg2Settings |
SGP2 Synthetic Households | Synthetic Households file | outputs/tyear/PopulationSim/SPG2/output/synthetic_households.csv | run_spg2 |
SPG2 Synthetic Persons | Synthetic Persons file | outputs/tyear/PopulationSim/SPG2/output/synthetic_persons.csv | run_spg2 |
SGP2 SynPopH | SWIM format synthetic households | outputs/tyear/SynPopH.csv | spg2PostProcess |
SPG2 SynPopP | SWIM format synthetic persons | outputs/tyear/SynPopP.csv | spg2PostProcess |
SynPop TAZ Summary | SWIM format TAZ summary of synthetic population | outputs/tyear/SynPop_Taz_Summary.csv | spg2PostProcess |
This section contains the output files and defines the fields within them. Note that the PopulationSim software creates households.csv and persons.csv. These files are then re-formatted for input to PT and named SynPopH.csv and SynPopP.csv.
Field | Description | Values |
---|---|---|
NP | Number of persons in household | 0 to max number of persons |
BLD | Number of units in structure | 01 .Mobile home or trailer 02 .One-family house detached 03 .One-family house attached 04 .2 Apartments 05 .3-4 Apartments 06 .5-9 Apartments 07 .10-19 Apartments 08 .20-49 Apartments 09 .50 or more apartments 10 .Boat, RV, van, etc. |
NWESR | Number of workers | 0 .N/A (less than 16 years old) 1 .Civilian employed, at work 2 .Civilian employed, with a job but not at work 3 .Unemployed 4 .Armed forces, at work 5 .Armed forces, with a job but not at work 6 .Not in labor force |
VEH | Number of vehicles | 0 .No vehicles 1 .1 vehicle 2 .2 vehicles 3 .3 vehicles 4 .4 vehicles 5 .5 vehicles 6 .6 or more vehicles |
hh_id | Household ID | 1 to max ID |
HHINC2009 | Household income in dollars ($2009) | -999 to max income |
AZONE | lpha zone | 1 to max zone number |
Field | Description | Values |
---|---|---|
per_num | Person ID | |
AGEP | Age | 00 .Under 1 year |
01..99 .1 to 99 years (Top-coded***) | ||
SEX | Gender | 1 .Male 2 .Female |
Hh_id | Household ID | |
ESR | Work status | 0 .N/A (less than 16 years old) 1 .Civilian employed, at work 2 .Civilian employed, with a job but not at work 3 .Unemployed 4 .Armed forces, at work 5 .Armed forces, with a job but not at work 6 .Not in labor force |
SCH | School enrollment | b .N/A (less than 3 years old) 1 .No, has not attended in the last 3 months 2 .Yes, public school or public college 3 .Yes, private school or college or home school |
INDP | Census Industry code | Industry recode for 2013 and later based on 2012 IND codes, see https://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/PUMSDataDict13.txt |
OCCP | Census Occupation code | Occupation recode for 2012 and later based on 2010 OCC codes, see https://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/PUMSDataDict13.txt |
Occupation | PECAS Occupation ID | Occupation categories consistent with AA |
occupationLabel | PECAS Occupation label | Labels for occupation categories |
split_industry_id | PECAS Split industry ID | Split industry codes consistent with AA |
split_industry | PECAS Split industry label | Labels for split industry codes |
Field | Description | Values |
---|---|---|
HHID | Household ID | |
Persons | Number of persons in household | 0 to max persons |
UNITS1 | Number of units in structure | 01 .Mobile home or trailer 02 .One-family house detached 03 .One-family house attached 04 .2 Apartments 05 .3-4 Apartments 06 .5-9 Apartments 07 .10-19 Apartments 08 .20-49 Apartments 09 .50 or more apartments 10 .Boat, RV, van, etc. |
AUTOS | Number of autos owned | 0 .No vehicles 1 .1 vehicle 2 .2 vehicles 3 .3 vehicles 4 .4 vehicles 5 .5 vehicles 6 .6 or more vehicles |
RHHINC | Household income in dollars ($2009) | -999 to max income |
AZONE | Alpha zone | 1 to max zone number |
Field | Description | Values |
---|---|---|
HH_ID | Household ID | Household ID |
PERS_ID | Person ID | Person ID |
AGE | Age | 00 .Under 1 year 01..99 .1 to 99 years (Top-coded***) |
SEX | Gender | 1 .Male 2 .Female |
RLABOR | Worker status | 0 .N/A (less than 16 years old) 1 .Civilian employed, at work 2 .Civilian employed, with a job but not at work 3 .Unemployed 4 .Armed forces, at work 5 .Armed forces, with a job but not at work 6 .Not in labor force |
SCHOOL | Student status | 0 .N/A (less than 3 years old) 1 .No, has not attended in the last 3 months 2 .Yes, public school or public college 3 .Yes, private school or college or home school |
INDUSTRY | Industry code | Industry recode for 2013 and later based on 2012 IND codes, see https://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/PUMSDataDict13.txt |
OCCUP | Occupation | Occupation recode for 2012 and later based on 2010 OCC codes, see https://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/PUMSDataDict13.txt |
SW_UNSPLIT_IND | Unsplit Industry ID | Split industry codes consistent with AA |
SW_OCCUP | Split Occupation ID | Occupation index from ACS |
SW_SPLIT_IND | Split Industry ID | Split industry codes consistent with AA |
Note: SW_UNSPLIT_IND and SW_SPLIT_IND were exactly the same field in the older SPG implementation so they were kept exactly the same in the PopulationSim implementation.
This section describes the data preparation performed for implementing PopulationSim in SWIM and the controls used for SPG1 and SPG2.
PopulationSim uses seed data consisting of households and persons to generate the synthetic population for a given region. We use ACS PUMS 2013-2017 data as the seed household and seed persons data. Data was downloaded for states of Oregon, Washington and California, concatenated together and filtered for the PUMAs that are within the SWIM region. The filtered data contains 31 Oregon PUMAs, 9 Washington PUMAs and 1 California PUMA. The filtered household and person data were processed further to create the missing control variables. A ‘Category’ field was created in the household seed file which contains the category of the household by size and income. In the person seed file, each worker was assigned a split_industry_id based on their Industry and Occupation IDs. The creation of these variables is described in detail in following paragraphs.
SPG2 uses households by household size and income as a control variable at the alpha zone level, hence the seed household data must contain this category field. Household size variable is divided into 2 categories – 1to2 and 3plus and household income is categorized into 9 categories, with varying ranges of income. These two variables are then combined to create 18 categories of households by size and income. It should be noted that the HINCP field was multiplied with adjustment factors to convert all records (2013-2017) to 2017 dollars and later multiplied by another factor to convert 2017 dollars to 2009 dollars.
VARIABLE | HOUSEHOLD SIZE | HOUSEHOLD INCOME ($) |
---|---|---|
HH0to8k1to2 | 1 to 2 | 0 to 8000 |
HH0to8k3plus | 3 or more | 0 to 8000 |
HH8to15k1to2 | 1 to 2 | 8000 to 15000 |
HH8to15k3plus | 3 or more | 8000 to 15000 |
HH15to23k1to2 | 1 to 2 | 15000 to 23000 |
HH15to23k3plus | 3 or more | 15000 to 23000 |
HH23to32k1to2 | 1 to 2 | 23000 to 32000 |
HH23to8k32plus | 3 or more | 23000 to 32000 |
HH32to46k1to2 | 1 to 2 | 32000 to 46000 |
HH32to46k3plus | 3 or more | 32000 to 46000 |
HH46to61k1to2 | 1 to 2 | 46000 to 61000 |
HH46to61k3plus | 3 or more | 46000 to 61000 |
HH61to76k1to2 | 1 to 2 | 61000 to 76000 |
HH61to76k3plus | 3 or more | 61000 to 76000 |
HH76to106k1to2 | 1 to 2 | 76000 to 106000 |
HH76to106k3plus | 3 or more | 76000 to 106000 |
HH106kUp1to2 | 1 to 2 | More than 106000 |
HH106kUp3plus | 3 or more | More than 106000 |
In SWIM, the number of households generated in the region is driven by the demand for labor in any given simulation year. The model attempts to generate the right kinds of workers by controlling the synthetic population on a combination of workers industry and occupation. Furthermore, SWIM attempts to control the number and location of labor by also considering the type of land-use that a worker might work in. For example, workers in the construction industry could be tradesman, designers, or persons engaged in marketing, accounting, or several other occupations. Each has a different likelihood of working on a construction site, working in an office, etc. To address this issue, the New Economic Development (NED) module uses four definitions of construction (CNST), which is differentiated by main, nres (non-residential), offc (office), res (residential), and other. Office is further identified as only available to work in office land-use types. For example CNST_main_xxx versus CNST_offc_off.
SPG1 and SPG2 use ‘Workers by Split Industry’ as a region level control, which requires the person seed data to contain the split_industry_id field. Since the same industry and occupation id can be split into multiple split_industries, this step draws a random number for a worker, and assigns the split industry id according to an input distribution of split industries. The table below provides an example of how the assignment is done. The pums_to_split_industry.csv file provides proportions of different jobs within an industry and is denoted by the ‘proportion’ field in the table. The random number drawn for each worker is denoted by the ‘draw’ field. Cumulative proportion is calculated to create ranges of proportions needed for assignment and is denoted by ‘cumprop’ field. This field is shifted down one row to create the range of proportions and is denoted by the ‘prev_cumprop’ field. The worker is assigned to the split industry whose range contains the random draw and is identified in the ‘select’ field in the table.
INDP | OCCP | split_ind_id | split_industry | proportion | draw | cumprop | prev_cumprop | select |
---|---|---|---|---|---|---|---|---|
770 | 6600 | 8 | CNST_main_xxx | 0.1176 | 0.0773 | 0.1176 | 0.0000 | 1 |
770 | 6600 | 9 | CNST_nres_xxx | 0.2941 | 0.0773 | 0.4118 | 0.1176 | 0 |
770 | 6600 | 10 | CNST_othr_xxx | 0.4705 | 0.0773 | 0.8824 | 0.4118 | 0 |
770 | 6600 | 11 | CNST_res_xxx | 0.1176 | 0.0773 | 1.0000 | 0.8824 | 0 |
The modified seed data contains these new fields, ‘Category’ in the household seed and ‘split_industry_id’ in the person seed.
Once the seed data is created, it is fixed for all model runs. The code currently checks to see if the SEED household data files exists. If it does not, it automatically creates the seed data from the input PUMS data. If it does exist, the code uses the input PUMS data to create the SEED data for the current run. This processing takes about 5-6 minutes to execute.
SPG1 is run at the regional level, meaning synthetic population is generated for only 1 geography, i.e. the region. This raises issues in running PopulationSim which requires at least one sub-seed geography, i.e. a geography smaller than the seed geography (PUMA). This issue was averted by creating a set of fake sub-seed and seed geography ids to allow PopulationSim to run. Instead of using the PUMA field in the seed file to identify seed geography, a new field named ‘SEED’ was created and the value was set to 1 for all records in the seed data. This helps PopulationSim understand that there is only one geography in the entire region. In the geo_cross_walk.csv file, a fake cross walk was created between the subseed, seed and region level geographies and all of them were set to 1.
Controls for SPG1 were created at the sub-seed level, although in practice, these are region level controls. This was done because PopulationSim requires at least one sub-seed level control to run. Also, a region level control is required for which, total population of the region was used. In summary, the controls for SPG1 are as follows, along with a description of how these are calculated and a table listing all the controls for SPG1.
The total number of households is obtained by dividing the total NED employment by average number of workers per household. The latter is calculated using the workers by household marginals distribution.
Target | Geography | Household or Person Control | Importance | Control Field |
---|---|---|---|---|
num_hh | SUBSEED | households | 1000000000 | HHS |
Total population is calculated by multiplying the total households calculated above by the average household size, which is obtained from the households by household size distribution created from census data.
Target | Geography | Household or Person Control | Importance | Control Field |
---|---|---|---|---|
TOTAL_POP | REGION | persons | 100000 | POPULATION |
Households by household size obtained from census data is scaled up/down to match the total households calculated using NED employment to create household size controls.
Target | Geography | Household or Person Control | Importance | Control Field |
---|---|---|---|---|
hh_size_1 | SUBSEED | households | 5000 | HH_SIZE1 |
hh_size_2 | SUBSEED | households | 5000 | HH_SIZE2 |
hh_size_3 | SUBSEED | households | 5000 | HH_SIZE3 |
hh_size_4 | SUBSEED | households | 5000 | HH_SIZE4M |
Households by number of workers is obtained from the workers per household marginals file and scaled up/down to match the total households.
Target | Geography | Household or Person Control | Importance | Control Field |
---|---|---|---|---|
hh_wrks_0 | SUBSEED | households | 5000 | HH_WRKR0 |
hh_wrks_1 | SUBSEED | households | 5000 | HH_WRKR1 |
hh_wrks_2 | SUBSEED | households | 5000 | HH_WRKR2 |
hh_wrks_3 | SUBSEED | households | 5000 | HH_WRKR3 |
hh_wrks_4 | SUBSEED | households | 5000 | HH_WRKR4 |
hh_wrks_5m | SUBSEED | households | 5000 | HH_WRKR5M |
Population by age group is obtained from the population forecast file and scaled up/down to match the total population calculated above.
Target | Geography | Household or Person Control | Importance | Control Field |
---|---|---|---|---|
person_age0to4 | SUBSEED | persons | 10000 | P_AGE0 |
person_age5to9 | SUBSEED | persons | 10000 | P_AGE5 |
person_age10to14 | SUBSEED | persons | 10000 | P_AGE10 |
person_age15to19 | SUBSEED | persons | 10000 | P_AGE15 |
person_age20to24 | SUBSEED | persons | 10000 | P_AGE20 |
person_age25to29 | SUBSEED | persons | 10000 | P_AGE25 |
person_age30to34 | SUBSEED | persons | 10000 | P_AGE30 |
person_age35to39 | SUBSEED | persons | 10000 | P_AGE35 |
person_age40to44 | SUBSEED | persons | 10000 | P_AGE40 |
person_age45to49 | SUBSEED | persons | 10000 | P_AGE45 |
person_age50to54 | SUBSEED | persons | 10000 | P_AGE50 |
person_age55to59 | SUBSEED | persons | 10000 | P_AGE55 |
person_age60to64 | SUBSEED | persons | 10000 | P_AGE60 |
person_age65to69 | SUBSEED | persons | 10000 | P_AGE65 |
person_age70to74 | SUBSEED | persons | 10000 | P_AGE70 |
person_age75to79 | SUBSEED | persons | 10000 | P_AGE75 |
person_age80to84 | SUBSEED | persons | 10000 | P_AGE80 |
person_age85m | SUBSEED | persons | 10000 | P_AGE85 |
'Workers by split industry' is derived from 'employment by split industry' in the activity_forecast.csv file. 'Employment by split industry' in the activity_forecast.csv file is multiplied by the respective jobs_to_workers_factor (see jobs to workers file, above) to obtain the number of workers by split industry.
Target | Geography | Household or Person Control | Importance | Control Field |
---|---|---|---|---|
FIRE_fnin_off | SUBSEED | persons | 100000 | FIRE_fnin_off |
INFO_info_off_li | SUBSEED | persons | 100000 | INFO_info_off_li |
ENGY_elec_hi | SUBSEED | persons | 100000 | ENGY_elec_hi |
GOV_admn_gov | SUBSEED | persons | 100000 | GOV_admn_gov |
HOSP_acc_acc | SUBSEED | persons | 100000 | HOSP_acc_acc |
HLTH_hosp_hosp | SUBSEED | persons | 100000 | HLTH_hosp_hosp |
RES_agmin_ag | SUBSEED | persons | 100000 | RES_agmin_ag |
ENGY_ptrl_hi | SUBSEED | persons | 100000 | ENGY_ptrl_hi |
RET_stor_off | SUBSEED | persons | 100000 | RET_stor_off |
SERV_stor_ret | SUBSEED | persons | 100000 | SERV_stor_ret |
RES_forst_log | SUBSEED | persons | 100000 | RES_forst_log |
RET_nstor_off | SUBSEED | persons | 100000 | RET_nstor_off |
ENGY_offc_off | SUBSEED | persons | 100000 | ENGY_offc_off |
RET_stor_ret | SUBSEED | persons | 100000 | RET_stor_ret |
MFG_hvtw_li | SUBSEED | persons | 100000 | MFG_hvtw_li |
GOV_offc_off | SUBSEED | persons | 100000 | GOV_offc_off |
RET_auto_ret | SUBSEED | persons | 100000 | RET_auto_ret |
MFG_htec_hi | SUBSEED | persons | 100000 | MFG_htec_hi |
SERV_home_xxx | SUBSEED | persons | 100000 | SERV_home_xxx |
MFG_hvtw_hi | SUBSEED | persons | 100000 | MFG_hvtw_hi |
HLTH_othr_off_li | SUBSEED | persons | 100000 | HLTH_othr_off_li |
ENGY_ngas_hi | SUBSEED | persons | 100000 | ENGY_ngas_hi |
ENT_ent_ret | SUBSEED | persons | 100000 | ENT_ent_ret |
MFG_lvtw_hi | SUBSEED | persons | 100000 | MFG_lvtw_hi |
MFG_htec_li | SUBSEED | persons | 100000 | MFG_htec_li |
UTL_othr_off | SUBSEED | persons | 100000 | UTL_othr_off |
HIED_hied_off_inst | SUBSEED | persons | 100000 | HIED_hied_off_inst |
CNST_offc_off | SUBSEED | persons | 100000 | CNST_offc_off |
FIRE_real_off | SUBSEED | persons | 100000 | FIRE_real_off |
HLTH_care_inst | SUBSEED | persons | 100000 | HLTH_care_inst |
CNST_othr_xxx | SUBSEED | persons | 100000 | CNST_othr_xxx |
SERV_nonp_off_inst | SUBSEED | persons | 100000 | SERV_nonp_off_inst |
UTL_othr_off_li | SUBSEED | persons | 100000 | UTL_othr_off_li |
TRNS_trns_ware | SUBSEED | persons | 100000 | TRNS_trns_ware |
SERV_bus_off | SUBSEED | persons | 100000 | SERV_bus_off |
MFG_offc_off | SUBSEED | persons | 100000 | MFG_offc_off |
SERV_tech_off | SUBSEED | persons | 100000 | SERV_tech_off |
CNST_res_xxx | SUBSEED | persons | 100000 | CNST_res_xxx |
K12_k12_k12 | SUBSEED | persons | 100000 | K12_k12_k12 |
WHSL_offc_off | SUBSEED | persons | 100000 | WHSL_offc_off |
CNST_main_xxx | SUBSEED | persons | 100000 | CNST_main_xxx |
K12_k12_off | SUBSEED | persons | 100000 | K12_k12_off |
RES_offc_off | SUBSEED | persons | 100000 | RES_offc_off |
TRNS_trns_off | SUBSEED | persons | 100000 | TRNS_trns_off |
HOSP_eat_ret_acc | SUBSEED | persons | 100000 | HOSP_eat_ret_acc |
MFG_food_li | SUBSEED | persons | 100000 | MFG_food_li |
INFO_info_off | SUBSEED | persons | 100000 | INFO_info_off |
MFG_wdppr_hi | SUBSEED | persons | 100000 | MFG_wdppr_hi |
SERV_site_li | SUBSEED | persons | 100000 | SERV_site_li |
WHSL_whsl_ware | SUBSEED | persons | 100000 | WHSL_whsl_ware |
MFG_food_hi | SUBSEED | persons | 100000 | MFG_food_hi |
CNST_nres_xxx | SUBSEED | persons | 100000 | CNST_nres_xxx |
SPG2 is run with controls at both alpha zone level and the region level. The region level controls are the same as SPG1 and two new controls are added at the alpha zone level. The alpha zone level controls are households by size and income categories and workers by occupation. Households by size and income categories are obtained from an AA output, the ActivityLocations2.csv file. Workers by occupation at the alpha zone level are created using the total labor dollar production by alpha zone and household category, from the laborDollarProduction.csv file (also from AA). First, the data in the file are aggregated across all alpha zones to get region level labor dollars by occupation and household category. Second, region level workers by occupation and household category is created from SPG1 output. Region level labor dollars is divided by region level workers to get per-worker-labor-dollar by occupation and household category. This rate is then used to divide alpha zone level labor dollars to obtain alpha zone level workers by occupation and household category. The resulting worker count is then aggregated across all household categories to obtain alpha zone workers by occupation. In summary, SPG2 controls are as follows, along with a description of how these are calculated and a table listing all the controls for SPG2.
Households by size and income categories are retrieved from the activity locations2 file which is an output of the Activity Allocation (AA) module. Since AA uses SPG1 outputs, the outputs of AA are ensures that controls are consistent across SPG1 and SPG2.
Target | Geography | Seed Table | Importance | Control Field |
---|---|---|---|---|
HH0to8k1to2 | AZONE | households | 5000 | HH0to8k1to2 |
HH0to8k3plus | AZONE | households | 5000 | HH0to8k3plus |
HH106kUp1to2 | AZONE | households | 5000 | HH106kUp1to2 |
HH106kUp3plus | AZONE | households | 5000 | HH106kUp3plus |
HH15to23k1to2 | AZONE | households | 5000 | HH15to23k1to2 |
HH15to23k3plus | AZONE | households | 5000 | HH15to23k3plus |
HH23to32k1to2 | AZONE | households | 5000 | HH23to32k1to2 |
HH23to32k3plus | AZONE | households | 5000 | HH23to32k3plus |
HH32to46k1to2 | AZONE | households | 5000 | HH32to46k1to2 |
HH32to46k3plus | AZONE | households | 5000 | HH32to46k3plus |
HH46to61k1to2 | AZONE | households | 5000 | HH46to61k1to2 |
HH46to61k3plus | AZONE | households | 5000 | HH46to61k3plus |
HH61to76k1to2 | AZONE | households | 5000 | HH61to76k1to2 |
HH61to76k3plus | AZONE | households | 5000 | HH61to76k3plus |
HH76to106k1to2 | AZONE | households | 5000 | HH76to106k1to2 |
HH76to106k3plus | AZONE | households | 5000 | HH76to106k3plus |
HH8to15k1to2 | AZONE | households | 5000 | HH8to15k1to2 |
HH8to15k3plus | AZONE | households | 5000 | HH8to15k3plus |
The total labor dollars by zone output from AA are read in (spg.labor.dollars.by.zone = /scenario/outputs/tYEAR/laborDollarProduction.csv). These are by occupation and have been mapped to PUMS workers by occupation.
a. These are summed across the region and divided by workers by occupation and household category (SPG1 output) to calculate labor dollars per worker.
b. Alpha zone total workers by occupation is calculated by dividing the alpha zone level total labor dollars by occupation and household category by the dollars per worker calculated in step a.
c. The household category is collapsed and workers by occupation and alpha zone is retained as a control.
Target | Geography | Seed Table | Importance | Control Field |
---|---|---|---|---|
A1_Mgmt_Bus | AZONE | persons | 10000 | A1-Mgmt Bus |
B1_Prof_Specialty | AZONE | persons | 10000 | B1-Prof Specialty |
B2_Education | AZONE | persons | 10000 | B2-Education |
B3_Health | AZONE | persons | 10000 | B3-Health |
B4_Technical_Unskilled | AZONE | persons | 10000 | B4-Technical Unskilled |
C1_Sales_Clerical_Professionals | AZONE | persons | 10000 | C1-Sales Clerical Professionals |
C2_Sales_Service | AZONE | persons | 10000 | C2-Sales Service |
C3_Clerical | AZONE | persons | 10000 | C3-Clerical |
C4_Sales_Clerical_Unskilled | AZONE | persons | 10000 | C4-Sales Clerical Unskilled |
D1_Production_Specialists | AZONE | persons | 10000 | D1-Production Specialists |
D2_MaintConstRepair_Specialists | AZONE | persons | 10000 | D2-MaintConstRepair Specialists |
D3_ProtectTrans_Specialists | AZONE | persons | 10000 | D3-ProtectTrans Specialists |
D4_Blue_Collar_Unskilled | AZONE | persons | 10000 | D4-Blue Collar Unskilled |
The primary file for running PopulationSim for SWIM is PopulationSimSPG.py. This python script contains multiple methods that run sequentially to implement PopulationSim. The arguments required to run this script are the properties files for the two POPSIMSPG modules (popsimspg1.properties or popsimspg2.properties) and the run mode for the respective modules (runSPG1 or runSPG2). A sample command to run the module independent of the SWIM model is as follows –
(path)/python PopulationSimSPG.py (path)/popsimspg1.properties runSPG1
or
(path)/python PopulationSimSPG.py (path)/popsimspg2.properties runSPG2
To manually run PopulationSim for SWIM do the following:
- Edit the tsteps csv file to run POPSIMSPG1 or POPSIMSPG2 to create a SWIM properties file
- Run build_run.bat to build the SWIM POPSIMSPG1 or POPSIMSPG2 properties file
- Run the command above, for example:
E:/tlumip/_test/root/model/lib/swimpy/python.exe
E:/tlumip/_test/root/scenario/model/code/populationsimspg.py
E:/tlumip/_test/root/scenario/outputs/t20/popsimspg2.properties runSPG2
SWIM-TLUMIP Model User Guide, version 2.5
- SI - SWIM Inputs
- NED - New Economic Demographics
- ALD - Aggregate Land Development
- AA - Activity Allocation
- POPSIMSPG - PopulationSim Synthetic Population Generator
- PT - Person Transport
- CT - Commercial Transport
- TA - Traffic Assignment
- TR - Transit Assignment
- SL - Select Link
- SWIM VIZ - Reporting DB