Skip to content

Data Updates

Drew Bollinger edited this page Oct 31, 2017 · 7 revisions

Requirements

  • Python (2.x.x)
  • Pip (comes with Python 2.7.9)
  • csvfix
  • Add the Python and csvfix installation directories to your path (mac, windows)
  • Two python scripts from Development Seed: csv-manipulation.py & es_populator.py
  • Two utility .csv files: aggregation_commodity.csv & aggregation_region.csv
  • One small text file: requirements.txt
  • IMPACT Output data prepared in the format described below
  • All of these commands should be run in the raw folder of this respository

Data Processing

Run python csv-manipulation.py [filename] in the command line where [filename] is a csv output of multiple scenarios in the following format:

impactparameter scenario commodity region year Val
PopXAgg -- Population SSP2_GFDL cbeef VEN 2050 46.2749

Data should be be aggregated so as not to include the production type variable.

The console will show the following (or similar based on the file) while running:

separating target file: 4DevSeed.csv
creating 5 files
creating file: SSP2-GFDL.csv
creating file: SSP2-HGEM.csv
creating file: SSP2-IPSL.csv
creating file: SSP2-MIROC.csv
creating file: SSP2-NoCC.csv

The processing slightly transforms the data to fit a specific schema:

  • Only the first word of the impactparameter is used
  • In all other fields, any space ( ) or dash (-) is replaced with an underscore (_)
  • All text is converted to lowercase
  • Commodity and region group/aggregate names are added according to aggregate_commodity.csv and aggregate_region.csv

Upload to Elasticsearch

The created scenario csvs will automatically be moved to the scenarios/ folder.

Run the next two commands in the command line where [username] and [password] are those provided for editing the elasticsearch cluster.

pip install -r requirements.txt
python es_populator.py [username] [password]

Expect the process to take ~10 minutes per scenario. Once the script is complete, all of the scenarios will be uploaded to the Heroku instance at https://ad21a5a8cb0789e9b73c2142d3c83e43.us-east-1.aws.found.io:9243

Deleting from Elasticsearch

Scenarios can be easily deleted from the cluster using delete.py:

python delete.py --scenario [scenario_name] [username] [password]

or to delete all scenarios:

python delete.py --delete-all [username] [password]
Clone this wiki locally