POPSIMSPG

Introduction

This wiki describes the implementation of PopulationSim in the Oregon Statewide Transportation/Land Use Integrated Model (SWIM-TLUMIP) and instructions for running the module within the SWIM-TLUMIP modeling system.

The previous implementation of the Synthetic Population Generator (SPG) modules, SPG1 and SPG2, have been replaced with PopulationSim (as of 2020), and are named POPSIMSPG1 and POPSIMSPG2 respectively. POPSIMSPG1 generates a synthetic population with controls specified only at the regional level. This produces a frequency distribution of households by household size, which is necessary for the Activity Allocation (AA) component of SWIM. POPSIMSPG2 runs with controls at both the alpha zone level and the regional level to generate a detailed synthetic population for use by the Person Transport (PT) and Transit Assignment (TR). The default setup for SWIM only runs POPSIMSPG2 in transport years (years in which PT and TS are run).

PopulationSim follows the steps below to create synthetic households and persons.

Initial Seed Balancing
Meta Control Factoring (step is parallelized for POPSIMSPG2)
Final Seed Balancing
Integerize Final Seed Weights
Expand Households

Details on PopulationSim software can be found here https://rsginc.github.io/populationsim/index.html. Further details on the previous Population Synthesizer used in SWIM (SPG) can be reviewed on the retired SPG wiki page. Although it should be noted that the SPG wiki page was never fully populated or described in a meaningful way.

Implementation

PopulationSim is run automatically from the Model Orchestrator. It calls a batch file that runs PopulationSim for SPG1 or SPG2 depending on the argument sent to the Python code. For more information on how to run PopulationSim manually, see below.

Each of the POPSIMSPG modules call multiple python functions to implement PopulationSim. The python functions for the two modules are as follows –

runSPG1:

createDirectories : This module creates the necessary directories to store PopulationSim inputs and outputs for each scenario year. PopulationSimBase directory is required for the module to run and is included in the latest repository. This directory contains the settings files for SPG1 and SPG2 which are copied over to each scenario year’s directory by the module itself. This is needed as controls for each year change but the settings remain the same over time. Hence the static files are stored in the PopulationSimBase directory which are then copied to the scenario year directory for every scenario year. A PopulationSim directory gets created inside the outputs directory of each scenario year. The PopulationSim directory also contains sub-directories including config, data and output. These directories contain the standard inputs and outputs for PopulationSim. These outputs are then post-processed to create SPG format outputs for SWIM.
createSeed : This module creates the seed data for running PopulationSim. The seed data is created using the PUMS (Public Use Microdata Sample) household and person data for the states of Oregon, California and Washington. PUMAs within the SWIM halo are filtered from the complete data set to obtain seed data for the SWIM PopulationSim module.
copySeeds : This module copies the seeds created for the first scenario year to subsequent years. The seeds need not be created in every iteration as they remain static over time.
spg1Controls : This module creates the control data for POPSIMSPG1. This module utilizes multiple inputs including Jobs to Workers Factor, Workers per Household marginals, household size distribution, NED population and employment forecast and the household seed file. These inputs are a mix of model inputs, and outputs of previous modules (NED). The locations of input and output files of each module are listed in the tables below.
run_spg1 : This module executes the PopulationSim program with the seed and control created above.
spg1PostProcess : This module processes the standard PopulationSim outputs and generates SPG format outputs for use by downstream SWIM modules. For example, this process converts synthetic_households.csv to HouseholdsByHHCategory.csv which is used by AA and SPG2.

runSPG2:

createDirectories : This module creates the necessary directories to store PopulationSim inputs and outputs for each scenario year. A PopulationSim directory gets created inside the outputs directory of each scenario year. The PopulationSim directory also contains sub-directories including config, data and output. These directories contain the standard inputs and outputs for PopulationSim. These outputs are then post-processed to create SPG format outputs for SWIM.
copySeeds : This module copies the seeds created for the first scenario year to subsequent years.
spg2Controls : This module creates the control data for POPSIMSPG2. This module utilizes multiple inputs including SPG1 outputs such as synthetic_households.csv, synthetic_persons.csv and HouseholdsByHHCategory.csv. It also uses laborDollarProduction.csv and ActivityLocations2.csv which are outputs of the AA module. SPG2 derives regional controls from SPG1 regional controls file and utilizes the puma_beta_alpha_xwalk.csv file to combine data from different geographies. The locations of input and output files of each module are listed in the tables below.
spg2Settings : This module updates the values of two settings in the settings.yaml file in the configs folder.
- num_processes: This setting defines the number of processors to use when running the module in parallel mode. The value for this setting is updated based on spg2.num.processors defined in globalTemplate.properties file.
- MAX_BALANCE_ITERATIONS_SIMULTANEOUS: This setting defines the maximum number of iterations used by the sub_balancer component of the module to reach the optimal solution. The value for this settings is updated based on spg2.max.iterations defined in globalTemplate.properties file.
run_spg2 : This module executes the PopulationSim program with the seed and control created above.
spg2PostProcess : This module processes the standard PopulationSim outputs and generates SPG format outputs for use by downstream SWIM modules. The standard PopulationSim output includes synthetic_households.csv and synthetic_persons.csv, both of which have a number of fields indicated by the user. The SPG format of these outputs are SynPopH.csv and SynPopP.csv, which have different fields and field names, hence they are created during the post-processing step. The post-processing step also creates the SynPopTAZSummary.csv file which summarizes the synthetic population by TAZ.

The above text somewhat hides these two important properties in globalTemplate.properites that should be reviewed and updated each time SWIM is moved onto a new machine. These features were added as PopulationSim was threaded to speed up processing:

## SPG2 related properties
spg2.num.processors = 2
spg2.max.iterations = 1000

The following tables describe PopulationSim inputs and outputs. Each table lists the input or output file, describes the file, indicate the location of the file, and lists the SWIM modules that are responsible for its creation (inputs) or consumption (outputs). A few key files are described in more detail. For more information on the data used to set up PopulationSim and how the software works, see the following sections.

Inputs:

Input File	Description	Address	Modules
Properties File	Module-specific properties from current model run. All functions in the PopulationSim script use these files to retrieve file input/output file locations and other model properties.	/outputs/tyear/popsimspg1.properties /outputs/tyear/popsimspg2.properties	createDirectories createSeed spg1Controls run_spg1 spg1PostProcess spg2Controls spg2Settings spg2PostProcess
PUMS Household (h) and Person (p) data for Oregon (41), California (06) and Washington (53).	Input household and person data by PUMA downloaded from the American Community Survey website.	root/census/psam_h06.csv root/census/psam_h41.csv root/census/psam_h53.csv root/census/psam_p06.csv root/census/psam_p41.csv root/census/psam_p53.csv	createSeed
PUMA to Alpha-Beta xwalk	Xwalk file created by SWIM_VISUM_Main.py	outputs/t20/puma_beta_alpha_xwalk.csv	createSeed spg2Controls spg2PostProcess
pums_to_split_industry xwalk	Correspondence between pums occupation and industry codes to split industries	inputs/parameters/pums_to_split_industry.csv	createSeed
ACS occupation categories	Updated ACS occupation categories including missing codes	inputs/parameters/acs_occupation_2005_2009_forPopSim.csv	createSeed
Jobs to Workers Factor	Jobs to workers factor file	inputs/parameters/JobsToWorkersFactor.csv	spg1Controls
Workers per HH	Workers per household distribution file	inputs/parameters/workersPerHouseholdMarginalxYEAR.csv	spg1Controls
HH by HH Size Distribution	Household Size distribution from the input PUMS data.	inputs/parameters/hh_dist.csv	spg1Controls
NED Employment	Forecast of the total amounts of production by activity, used to update TechnologyOptionsW and ActivityTotalW (see [Working Files][Working Files])	/outputs/tyear/activity_forecast.csv	spg1Controls
NED Population	Population forecast from the NED module	/outputs/tyear/population_forecast.csv	spg1Controls
Household Seed	Seed household file	inputs/parameters/PopulationSimBase/data/seed_households.csv	spg1Controls
Controls Configuration	Control file for running populationsim, contains information about household and person level control variables for generating synthetic population	outputs/tyear/PopulationSim/SPG1/configs/controls.csv outputs/tyear/PopulationSim/SPG2/configs/controls.csv	run_spg1 run_spg2
PopulationSim Settings	Software settings for PopulationSim	outputs/tyear/PopulationSim/SPG1/configs/settings.yaml outputs/ tyear/PopulationSim/SPG2/configs/settings.yaml	run_spg1 spg2Settings run_spg2
PopulationSim Log	PopulationSim log file	outputs/ tyear/PopulationSim/SPG1/configs/logging.yaml outputs/tyear/PopulationSim/SPG1/configs/logging.yaml	run_spg1 run_spg2
SPG1 Synthetic Households	Synthetic Households file	outputs/tyear/PopulationSim/SPG1/output/synthetic_households.csv	spg1PostProcess
SPG1 Synthetic Households	Synthetic Households file	outputs/tyear/PopulationSim/SPG1/output/synthetic_households.csv	spg2Controls
SPG1 Synthetic Persons	Synthetic Persons file	outputs/tyear/PopulationSim/SPG1/output/synthetic_persons.csv	spg2Controls
HH by HH Category	SPG1 modelwide summary of households by category	/outputs/tyear/HouseholdsByHHCategory.csv	spg2Controls
Activity Locations2	Activity quantities by TAZ (alpha zone)	/outputs/tyear/ActivityLocations2.csv	spg2Controls
Labor Dollar Production	Total amount of each type of labor produced by each household category in each zone	/outputs/tyear/laborDollarProduction.csv	spg2Controls
SPG1 Region Control	Region Level control for SPG1	outputs/tyear/PopulationSim/SPG1/data/control_region.csv	spg2Controls
SGP2 Synthetic Households	Synthetic Households file	outputs/tyear/PopulationSim/SPG2/output/synthetic_households.csv	spg2PostProcess
SPG2 Synthetic Persons	Synthetic Persons file	outputs/tyear/PopulationSim/SPG2/output/synthetic_persons.csv	spg2PostProcess

A brief description of some important input files follows.

ACS Occupation Categories File:

acs_occupation_2005_2009_forPopSim.csv lists all occupations from Census and maps them to a PECAS occupation label. The file is used to lookup the occupation field from PUMS seed data, and assign it to a PECAS occupation. The origin of the file is from the initial SPG implementation. However, the original file (acs_occupation_2005_2009.csv) was missing many occupation categories that exist in the current ACS PUMS data, so it was modified to include occupation categories that were missing by finding the closest occupation code in the original file, and the PECAS occupation label for that code was assigned to the missing occupation.

JobsToWorkersFactor File:

The NED model produces both jobs (employment) and workers (persons), reflecting the mismatch between US Bureau of Economic Analysis job data and Census worker data. This file contains factors to convert jobs to workers by SPG sector. NED forecasted jobs are multiplied by these factors to calculate workers by sector, to account for mismatches between industry categories between the two datasets, as well as workers working multiple jobs (see workers by split industry, below).

WorkersPerHouseholdMarginalsxYEAR File:

This file is created using census data. The current file has workers per household records from 1990 to 2000. Households are grouped by number of worker categories such as 0, 1, 2, 3, 4 and 5.4 (average number of workers in 5+ worker households).

Outputs:

Output File	Description	Address	Modules
Seed Households	Seed household file	outputs/tyear/PopulationSim/SPG/data/seed_households.csv	createSeed
Seed Persons	Seed persons file	outputs/tyear/PopulationSim/SPG/data/seed_persons.csv	createSeed
HH by HH Size Distribution	SPG1 modelwide summary of households by category	inputs/parameters/hh_dist.csv	createSeed
SPG1 Subseed Control	Controls at the subseed (sub-PUMA) geography. PopulationSim requires at least one geography smaller than the seed geography to run. This geography is fake and has a value of 1 for all entries.	outputs/tyear/PopulationSim/SPG1/data/control_subseed.csv	spg1Controls
SPG1 Region Control	Region Level control for SPG1	outputs/tyear/PopulationSim/SPG1/data/control_region.csv	spg1Controls
SPG1 Geo xwalk	xwalk between various geographies in seed and control data	outputs/tyear/PopulationSim/SPG1/data/geo_cross_walk.csv	spg1Controls
SGP1 Synthetic Households	Synthetic Households file	outputs/tyear/PopulationSim/SPG1/output/synthetic_households.csv	run_spg1
SPG1 Synthetic Persons	Synthetic Persons file	outputs/tyear/PopulationSim/SPG1/output/synthetic_persons.csv	run_spg1
HH by HH Category	SPG1 modelwide summary of households by category	/outputs/tyear/HouseholdsByHHCategory.csv	spg1PostProcess
SPG2 Alpha Zone Control	Alpha zone level controls for SPG2	outputs/tyear/PopulationSim/SPG2/data/control_alpha.csv	spg2Controls
SPG2 Region Control	Region level controls for SPG2	outputs/tyear/PopulationSim/SPG2/data/control_region.csv	spg2Controls
SPG2 Geo xwalk	xwalk between various geographies in seed and control data	outputs/tyear/PopulationSim/SPG2/data/geo_cross_walk.csv	spg2Controls
SPG2 settings	Setting file for SPG2 module	outputs/tyear/PopulationSim/SPG2/configs/settings.yaml	spg2Settings
SGP2 Synthetic Households	Synthetic Households file	outputs/tyear/PopulationSim/SPG2/output/synthetic_households.csv	run_spg2
SPG2 Synthetic Persons	Synthetic Persons file	outputs/tyear/PopulationSim/SPG2/output/synthetic_persons.csv	run_spg2
SGP2 SynPopH	SWIM format synthetic households	outputs/tyear/SynPopH.csv	spg2PostProcess
SPG2 SynPopP	SWIM format synthetic persons	outputs/tyear/SynPopP.csv	spg2PostProcess
SynPop TAZ Summary	SWIM format TAZ summary of synthetic population	outputs/tyear/SynPop_Taz_Summary.csv	spg2PostProcess

Output Files:

This section contains the output files and defines the fields within them. Note that the PopulationSim software creates households.csv and persons.csv. These files are then re-formatted for input to PT and named SynPopH.csv and SynPopP.csv.

filename: spg2_synthetic_households.csv

Field	Description	Values
NP	Number of persons in household	0 to max number of persons
BLD	Number of units in structure	01 .Mobile home or trailer 02 .One-family house detached 03 .One-family house attached 04 .2 Apartments 05 .3-4 Apartments 06 .5-9 Apartments 07 .10-19 Apartments 08 .20-49 Apartments 09 .50 or more apartments 10 .Boat, RV, van, etc.
NWESR	Number of workers	0 .N/A (less than 16 years old) 1 .Civilian employed, at work 2 .Civilian employed, with a job but not at work 3 .Unemployed 4 .Armed forces, at work 5 .Armed forces, with a job but not at work 6 .Not in labor force
VEH	Number of vehicles	0 .No vehicles 1 .1 vehicle 2 .2 vehicles 3 .3 vehicles 4 .4 vehicles 5 .5 vehicles 6 .6 or more vehicles
hh_id	Household ID	1 to max ID
HHINC2009	Household income in dollars ($2009)	-999 to max income
AZONE	lpha zone	1 to max zone number

filename: spg2_synthetic_persons.csv

Field	Description	Values
per_num	Person ID
AGEP	Age	00 .Under 1 year
		01..99 .1 to 99 years (Top-coded***)
SEX	Gender	1 .Male 2 .Female
Hh_id	Household ID
ESR	Work status	0 .N/A (less than 16 years old) 1 .Civilian employed, at work 2 .Civilian employed, with a job but not at work 3 .Unemployed 4 .Armed forces, at work 5 .Armed forces, with a job but not at work 6 .Not in labor force
SCH	School enrollment	b .N/A (less than 3 years old) 1 .No, has not attended in the last 3 months 2 .Yes, public school or public college 3 .Yes, private school or college or home school
INDP	Census Industry code	Industry recode for 2013 and later based on 2012 IND codes, see https://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/PUMSDataDict13.txt
OCCP	Census Occupation code	Occupation recode for 2012 and later based on 2010 OCC codes, see https://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/PUMSDataDict13.txt
Occupation	PECAS Occupation ID	Occupation categories consistent with AA
occupationLabel	PECAS Occupation label	Labels for occupation categories
split_industry_id	PECAS Split industry ID	Split industry codes consistent with AA
split_industry	PECAS Split industry label	Labels for split industry codes

filename: SynPopH.csv

Field	Description	Values
HHID	Household ID
Persons	Number of persons in household	0 to max persons
UNITS1	Number of units in structure	01 .Mobile home or trailer 02 .One-family house detached 03 .One-family house attached 04 .2 Apartments 05 .3-4 Apartments 06 .5-9 Apartments 07 .10-19 Apartments 08 .20-49 Apartments 09 .50 or more apartments 10 .Boat, RV, van, etc.
AUTOS	Number of autos owned	0 .No vehicles 1 .1 vehicle 2 .2 vehicles 3 .3 vehicles 4 .4 vehicles 5 .5 vehicles 6 .6 or more vehicles
RHHINC	Household income in dollars ($2009)	-999 to max income
AZONE	Alpha zone	1 to max zone number

filename: SynPopP.csv

Field	Description	Values
HH_ID	Household ID	Household ID
PERS_ID	Person ID	Person ID
AGE	Age	00 .Under 1 year 01..99 .1 to 99 years (Top-coded***)
SEX	Gender	1 .Male 2 .Female
RLABOR	Worker status	0 .N/A (less than 16 years old) 1 .Civilian employed, at work 2 .Civilian employed, with a job but not at work 3 .Unemployed 4 .Armed forces, at work 5 .Armed forces, with a job but not at work 6 .Not in labor force
SCHOOL	Student status	0 .N/A (less than 3 years old) 1 .No, has not attended in the last 3 months 2 .Yes, public school or public college 3 .Yes, private school or college or home school
INDUSTRY	Industry code	Industry recode for 2013 and later based on 2012 IND codes, see https://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/PUMSDataDict13.txt
OCCUP	Occupation	Occupation recode for 2012 and later based on 2010 OCC codes, see https://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/PUMSDataDict13.txt
SW_UNSPLIT_IND	Unsplit Industry ID	Split industry codes consistent with AA
SW_OCCUP	Split Occupation ID	Occupation index from ACS
SW_SPLIT_IND	Split Industry ID	Split industry codes consistent with AA

Note: SW_UNSPLIT_IND and SW_SPLIT_IND were exactly the same field in the older SPG implementation so they were kept exactly the same in the PopulationSim implementation.

Data Preparation And PopulationSim Controls

This section describes the data preparation performed for implementing PopulationSim in SWIM and the controls used for SPG1 and SPG2.

Seed Data

PopulationSim uses seed data consisting of households and persons to generate the synthetic population for a given region. We use ACS PUMS 2013-2017 data as the seed household and seed persons data. Data was downloaded for states of Oregon, Washington and California, concatenated together and filtered for the PUMAs that are within the SWIM region. The filtered data contains 31 Oregon PUMAs, 9 Washington PUMAs and 1 California PUMA. The filtered household and person data were processed further to create the missing control variables. A ‘Category’ field was created in the household seed file which contains the category of the household by size and income. In the person seed file, each worker was assigned a split_industry_id based on their Industry and Occupation IDs. The creation of these variables is described in detail in following paragraphs.

Household ‘Category’ Variable

SPG2 uses households by household size and income as a control variable at the alpha zone level, hence the seed household data must contain this category field. Household size variable is divided into 2 categories – 1to2 and 3plus and household income is categorized into 9 categories, with varying ranges of income. These two variables are then combined to create 18 categories of households by size and income. It should be noted that the HINCP field was multiplied with adjustment factors to convert all records (2013-2017) to 2017 dollars and later multiplied by another factor to convert 2017 dollars to 2009 dollars.

Table 1 HOUSEHOLD CATEGORY VARIABLE DEFINITIONS

VARIABLE	HOUSEHOLD SIZE	HOUSEHOLD INCOME ($)
HH0to8k1to2	1 to 2	0 to 8000
HH0to8k3plus	3 or more	0 to 8000
HH8to15k1to2	1 to 2	8000 to 15000
HH8to15k3plus	3 or more	8000 to 15000
HH15to23k1to2	1 to 2	15000 to 23000
HH15to23k3plus	3 or more	15000 to 23000
HH23to32k1to2	1 to 2	23000 to 32000
HH23to8k32plus	3 or more	23000 to 32000
HH32to46k1to2	1 to 2	32000 to 46000
HH32to46k3plus	3 or more	32000 to 46000
HH46to61k1to2	1 to 2	46000 to 61000
HH46to61k3plus	3 or more	46000 to 61000
HH61to76k1to2	1 to 2	61000 to 76000
HH61to76k3plus	3 or more	61000 to 76000
HH76to106k1to2	1 to 2	76000 to 106000
HH76to106k3plus	3 or more	76000 to 106000
HH106kUp1to2	1 to 2	More than 106000
HH106kUp3plus	3 or more	More than 106000

‘split industry ID’ variable

In SWIM, the number of households generated in the region is driven by the demand for labor in any given simulation year. The model attempts to generate the right kinds of workers by controlling the synthetic population on a combination of workers industry and occupation. Furthermore, SWIM attempts to control the number and location of labor by also considering the type of land-use that a worker might work in. For example, workers in the construction industry could be tradesman, designers, or persons engaged in marketing, accounting, or several other occupations. Each has a different likelihood of working on a construction site, working in an office, etc. To address this issue, the New Economic Development (NED) module uses four definitions of construction (CNST), which is differentiated by main, nres (non-residential), offc (office), res (residential), and other. Office is further identified as only available to work in office land-use types. For example CNST_main_xxx versus CNST_offc_off.

SPG1 and SPG2 use ‘Workers by Split Industry’ as a region level control, which requires the person seed data to contain the split_industry_id field. Since the same industry and occupation id can be split into multiple split_industries, this step draws a random number for a worker, and assigns the split industry id according to an input distribution of split industries. The table below provides an example of how the assignment is done. The pums_to_split_industry.csv file provides proportions of different jobs within an industry and is denoted by the ‘proportion’ field in the table. The random number drawn for each worker is denoted by the ‘draw’ field. Cumulative proportion is calculated to create ranges of proportions needed for assignment and is denoted by ‘cumprop’ field. This field is shifted down one row to create the range of proportions and is denoted by the ‘prev_cumprop’ field. The worker is assigned to the split industry whose range contains the random draw and is identified in the ‘select’ field in the table.

Table 2 Split Industry Assignment To Workers

INDP	OCCP	split_ind_id	split_industry	proportion	draw	cumprop	prev_cumprop	select
770	6600	8	CNST_main_xxx	0.1176	0.0773	0.1176	0.0000	1
770	6600	9	CNST_nres_xxx	0.2941	0.0773	0.4118	0.1176	0
770	6600	10	CNST_othr_xxx	0.4705	0.0773	0.8824	0.4118	0
770	6600	11	CNST_res_xxx	0.1176	0.0773	1.0000	0.8824	0

The modified seed data contains these new fields, ‘Category’ in the household seed and ‘split_industry_id’ in the person seed.

Once the seed data is created, it is fixed for all model runs. The code currently checks to see if the SEED household data files exists. If it does not, it automatically creates the seed data from the input PUMS data. If it does exist, the code uses the input PUMS data to create the SEED data for the current run. This processing takes about 5-6 minutes to execute.

Controls

SPG1 Controls

SPG1 is run at the regional level, meaning synthetic population is generated for only 1 geography, i.e. the region. This raises issues in running PopulationSim which requires at least one sub-seed geography, i.e. a geography smaller than the seed geography (PUMA). This issue was averted by creating a set of fake sub-seed and seed geography ids to allow PopulationSim to run. Instead of using the PUMA field in the seed file to identify seed geography, a new field named ‘SEED’ was created and the value was set to 1 for all records in the seed data. This helps PopulationSim understand that there is only one geography in the entire region. In the geo_cross_walk.csv file, a fake cross walk was created between the subseed, seed and region level geographies and all of them were set to 1.

Controls for SPG1 were created at the sub-seed level, although in practice, these are region level controls. This was done because PopulationSim requires at least one sub-seed level control to run. Also, a region level control is required for which, total population of the region was used. In summary, the controls for SPG1 are as follows, along with a description of how these are calculated and a table listing all the controls for SPG1.

Total Households

The total number of households is obtained by dividing the total NED employment by average number of workers per household. The latter is calculated using the workers by household marginals distribution.

Target	Geography	Household or Person Control	Importance	Control Field
num_hh	SUBSEED	households	1000000000	HHS

Total Population

Total population is calculated by multiplying the total households calculated above by the average household size, which is obtained from the households by household size distribution created from census data.

Target	Geography	Household or Person Control	Importance	Control Field
TOTAL_POP	REGION	persons	100000	POPULATION

Households by household size

Households by household size obtained from census data is scaled up/down to match the total households calculated using NED employment to create household size controls.

Target	Geography	Household or Person Control	Importance	Control Field
hh_size_1	SUBSEED	households	5000	HH_SIZE1
hh_size_2	SUBSEED	households	5000	HH_SIZE2
hh_size_3	SUBSEED	households	5000	HH_SIZE3
hh_size_4	SUBSEED	households	5000	HH_SIZE4M

Households by number of workers

Households by number of workers is obtained from the workers per household marginals file and scaled up/down to match the total households.

Target	Geography	Household or Person Control	Importance	Control Field
hh_wrks_0	SUBSEED	households	5000	HH_WRKR0
hh_wrks_1	SUBSEED	households	5000	HH_WRKR1
hh_wrks_2	SUBSEED	households	5000	HH_WRKR2
hh_wrks_3	SUBSEED	households	5000	HH_WRKR3
hh_wrks_4	SUBSEED	households	5000	HH_WRKR4
hh_wrks_5m	SUBSEED	households	5000	HH_WRKR5M

Population by age group

Population by age group is obtained from the population forecast file and scaled up/down to match the total population calculated above.

Target	Geography	Household or Person Control	Importance	Control Field
person_age0to4	SUBSEED	persons	10000	P_AGE0
person_age5to9	SUBSEED	persons	10000	P_AGE5
person_age10to14	SUBSEED	persons	10000	P_AGE10
person_age15to19	SUBSEED	persons	10000	P_AGE15
person_age20to24	SUBSEED	persons	10000	P_AGE20
person_age25to29	SUBSEED	persons	10000	P_AGE25
person_age30to34	SUBSEED	persons	10000	P_AGE30
person_age35to39	SUBSEED	persons	10000	P_AGE35
person_age40to44	SUBSEED	persons	10000	P_AGE40
person_age45to49	SUBSEED	persons	10000	P_AGE45
person_age50to54	SUBSEED	persons	10000	P_AGE50
person_age55to59	SUBSEED	persons	10000	P_AGE55
person_age60to64	SUBSEED	persons	10000	P_AGE60
person_age65to69	SUBSEED	persons	10000	P_AGE65
person_age70to74	SUBSEED	persons	10000	P_AGE70
person_age75to79	SUBSEED	persons	10000	P_AGE75
person_age80to84	SUBSEED	persons	10000	P_AGE80
person_age85m	SUBSEED	persons	10000	P_AGE85

Workers by split industry

'Workers by split industry' is derived from 'employment by split industry' in the activity_forecast.csv file. 'Employment by split industry' in the activity_forecast.csv file is multiplied by the respective jobs_to_workers_factor (see jobs to workers file, above) to obtain the number of workers by split industry.

Target	Geography	Household or Person Control	Importance	Control Field
FIRE_fnin_off	SUBSEED	persons	100000	FIRE_fnin_off
INFO_info_off_li	SUBSEED	persons	100000	INFO_info_off_li
ENGY_elec_hi	SUBSEED	persons	100000	ENGY_elec_hi
GOV_admn_gov	SUBSEED	persons	100000	GOV_admn_gov
HOSP_acc_acc	SUBSEED	persons	100000	HOSP_acc_acc
HLTH_hosp_hosp	SUBSEED	persons	100000	HLTH_hosp_hosp
RES_agmin_ag	SUBSEED	persons	100000	RES_agmin_ag
ENGY_ptrl_hi	SUBSEED	persons	100000	ENGY_ptrl_hi
RET_stor_off	SUBSEED	persons	100000	RET_stor_off
SERV_stor_ret	SUBSEED	persons	100000	SERV_stor_ret
RES_forst_log	SUBSEED	persons	100000	RES_forst_log
RET_nstor_off	SUBSEED	persons	100000	RET_nstor_off
ENGY_offc_off	SUBSEED	persons	100000	ENGY_offc_off
RET_stor_ret	SUBSEED	persons	100000	RET_stor_ret
MFG_hvtw_li	SUBSEED	persons	100000	MFG_hvtw_li
GOV_offc_off	SUBSEED	persons	100000	GOV_offc_off
RET_auto_ret	SUBSEED	persons	100000	RET_auto_ret
MFG_htec_hi	SUBSEED	persons	100000	MFG_htec_hi
SERV_home_xxx	SUBSEED	persons	100000	SERV_home_xxx
MFG_hvtw_hi	SUBSEED	persons	100000	MFG_hvtw_hi
HLTH_othr_off_li	SUBSEED	persons	100000	HLTH_othr_off_li
ENGY_ngas_hi	SUBSEED	persons	100000	ENGY_ngas_hi
ENT_ent_ret	SUBSEED	persons	100000	ENT_ent_ret
MFG_lvtw_hi	SUBSEED	persons	100000	MFG_lvtw_hi
MFG_htec_li	SUBSEED	persons	100000	MFG_htec_li
UTL_othr_off	SUBSEED	persons	100000	UTL_othr_off
HIED_hied_off_inst	SUBSEED	persons	100000	HIED_hied_off_inst
CNST_offc_off	SUBSEED	persons	100000	CNST_offc_off
FIRE_real_off	SUBSEED	persons	100000	FIRE_real_off
HLTH_care_inst	SUBSEED	persons	100000	HLTH_care_inst
CNST_othr_xxx	SUBSEED	persons	100000	CNST_othr_xxx
SERV_nonp_off_inst	SUBSEED	persons	100000	SERV_nonp_off_inst
UTL_othr_off_li	SUBSEED	persons	100000	UTL_othr_off_li
TRNS_trns_ware	SUBSEED	persons	100000	TRNS_trns_ware
SERV_bus_off	SUBSEED	persons	100000	SERV_bus_off
MFG_offc_off	SUBSEED	persons	100000	MFG_offc_off
SERV_tech_off	SUBSEED	persons	100000	SERV_tech_off
CNST_res_xxx	SUBSEED	persons	100000	CNST_res_xxx
K12_k12_k12	SUBSEED	persons	100000	K12_k12_k12
WHSL_offc_off	SUBSEED	persons	100000	WHSL_offc_off
CNST_main_xxx	SUBSEED	persons	100000	CNST_main_xxx
K12_k12_off	SUBSEED	persons	100000	K12_k12_off
RES_offc_off	SUBSEED	persons	100000	RES_offc_off
TRNS_trns_off	SUBSEED	persons	100000	TRNS_trns_off
HOSP_eat_ret_acc	SUBSEED	persons	100000	HOSP_eat_ret_acc
MFG_food_li	SUBSEED	persons	100000	MFG_food_li
INFO_info_off	SUBSEED	persons	100000	INFO_info_off
MFG_wdppr_hi	SUBSEED	persons	100000	MFG_wdppr_hi
SERV_site_li	SUBSEED	persons	100000	SERV_site_li
WHSL_whsl_ware	SUBSEED	persons	100000	WHSL_whsl_ware
MFG_food_hi	SUBSEED	persons	100000	MFG_food_hi
CNST_nres_xxx	SUBSEED	persons	100000	CNST_nres_xxx

SPG2 Controls

SPG2 is run with controls at both alpha zone level and the region level. The region level controls are the same as SPG1 and two new controls are added at the alpha zone level. The alpha zone level controls are households by size and income categories and workers by occupation. Households by size and income categories are obtained from an AA output, the ActivityLocations2.csv file. Workers by occupation at the alpha zone level are created using the total labor dollar production by alpha zone and household category, from the laborDollarProduction.csv file (also from AA). First, the data in the file are aggregated across all alpha zones to get region level labor dollars by occupation and household category. Second, region level workers by occupation and household category is created from SPG1 output. Region level labor dollars is divided by region level workers to get per-worker-labor-dollar by occupation and household category. This rate is then used to divide alpha zone level labor dollars to obtain alpha zone level workers by occupation and household category. The resulting worker count is then aggregated across all household categories to obtain alpha zone workers by occupation. In summary, SPG2 controls are as follows, along with a description of how these are calculated and a table listing all the controls for SPG2.

Households by Size and Income (Alpha)

Households by size and income categories are retrieved from the activity locations2 file which is an output of the Activity Allocation (AA) module. Since AA uses SPG1 outputs, the outputs of AA are ensures that controls are consistent across SPG1 and SPG2.

Target	Geography	Seed Table	Importance	Control Field
HH0to8k1to2	AZONE	households	5000	HH0to8k1to2
HH0to8k3plus	AZONE	households	5000	HH0to8k3plus
HH106kUp1to2	AZONE	households	5000	HH106kUp1to2
HH106kUp3plus	AZONE	households	5000	HH106kUp3plus
HH15to23k1to2	AZONE	households	5000	HH15to23k1to2
HH15to23k3plus	AZONE	households	5000	HH15to23k3plus
HH23to32k1to2	AZONE	households	5000	HH23to32k1to2
HH23to32k3plus	AZONE	households	5000	HH23to32k3plus
HH32to46k1to2	AZONE	households	5000	HH32to46k1to2
HH32to46k3plus	AZONE	households	5000	HH32to46k3plus
HH46to61k1to2	AZONE	households	5000	HH46to61k1to2
HH46to61k3plus	AZONE	households	5000	HH46to61k3plus
HH61to76k1to2	AZONE	households	5000	HH61to76k1to2
HH61to76k3plus	AZONE	households	5000	HH61to76k3plus
HH76to106k1to2	AZONE	households	5000	HH76to106k1to2
HH76to106k3plus	AZONE	households	5000	HH76to106k3plus
HH8to15k1to2	AZONE	households	5000	HH8to15k1to2
HH8to15k3plus	AZONE	households	5000	HH8to15k3plus

Workers by Occupation (Alpha)

The total labor dollars by zone output from AA are read in (spg.labor.dollars.by.zone = /scenario/outputs/tYEAR/laborDollarProduction.csv). These are by occupation and have been mapped to PUMS workers by occupation.

a. These are summed across the region and divided by workers by occupation and household category (SPG1 output) to calculate labor dollars per worker.

b. Alpha zone total workers by occupation is calculated by dividing the alpha zone level total labor dollars by occupation and household category by the dollars per worker calculated in step a.

c. The household category is collapsed and workers by occupation and alpha zone is retained as a control.

Target	Geography	Seed Table	Importance	Control Field
A1_Mgmt_Bus	AZONE	persons	10000	A1-Mgmt Bus
B1_Prof_Specialty	AZONE	persons	10000	B1-Prof Specialty
B2_Education	AZONE	persons	10000	B2-Education
B3_Health	AZONE	persons	10000	B3-Health
B4_Technical_Unskilled	AZONE	persons	10000	B4-Technical Unskilled
C1_Sales_Clerical_Professionals	AZONE	persons	10000	C1-Sales Clerical Professionals
C2_Sales_Service	AZONE	persons	10000	C2-Sales Service
C3_Clerical	AZONE	persons	10000	C3-Clerical
C4_Sales_Clerical_Unskilled	AZONE	persons	10000	C4-Sales Clerical Unskilled
D1_Production_Specialists	AZONE	persons	10000	D1-Production Specialists
D2_MaintConstRepair_Specialists	AZONE	persons	10000	D2-MaintConstRepair Specialists
D3_ProtectTrans_Specialists	AZONE	persons	10000	D3-ProtectTrans Specialists
D4_Blue_Collar_Unskilled	AZONE	persons	10000	D4-Blue Collar Unskilled

How to Run Manually:

The primary file for running PopulationSim for SWIM is PopulationSimSPG.py. This python script contains multiple methods that run sequentially to implement PopulationSim. The arguments required to run this script are the properties files for the two POPSIMSPG modules (popsimspg1.properties or popsimspg2.properties) and the run mode for the respective modules (runSPG1 or runSPG2). A sample command to run the module independent of the SWIM model is as follows –

(path)/python PopulationSimSPG.py (path)/popsimspg1.properties runSPG1

or

(path)/python PopulationSimSPG.py (path)/popsimspg2.properties runSPG2

To manually run PopulationSim for SWIM do the following:

Edit the tsteps csv file to run POPSIMSPG1 or POPSIMSPG2 to create a SWIM properties file
Run build_run.bat to build the SWIM POPSIMSPG1 or POPSIMSPG2 properties file
Run the command above, for example:

E:/tlumip/_test/root/model/lib/swimpy/python.exe
E:/tlumip/_test/root/scenario/model/code/populationsimspg.py
E:/tlumip/_test/root/scenario/outputs/t20/popsimspg2.properties runSPG2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly