Skip to content
Chris Reade edited this page Feb 1, 2016 · 5 revisions

Getting Data

Two targets for adding data into the app are included in manage.py

test_client

The test client can be run with python manage.py test_client and currently only works for complaints. It’s main goal is to simulate incoming data that would potentially break the ui or cleaning processes in interesting ways. It requires a running local server in a separate process.

It first deletes everything locally, (so be careful not to run it anywhere important, which I realize now is a pretty silly thing to do maybe? I just opened an issue to fix it) runs all migrations and creates an admin user and a department. It will then create an extractor and add 100 procedurally generated complaints through the actual data/complaints endpoint. This should give a pretty good idea of what a generic although sparse dataset would look like. Also included are a series of mutations that can be performed on the data to generate interesting edge cases. All that’s needed to use them is to uncomment the line creating the one you want and adding the instance to the empty array in test_client.run(department, []). They can be combined by adding more of them to the array. Some of them are tunable with a constructor argument but that proved less helpful than first thought and all are currently defaulted to the highest level of mutation.

load_test_data

python manage.py load_test_data looks for csv's at the following hard coded paths (relative to the project directory):

  • data/testdata/complaints/complaints.csv
  • data/testdata/uof/uof.csv
  • data/testdata/ois/ois.csv
  • data/testdata/denominators/denominators.csv
  • data/testdata/demographics/demographics.csv

For each of these it will use the hardcoded column names (e.g. CITCHARGE_TYPE here https://github.com/codeforamerica/comport/blob/master/manage.py#L120) to create models and load them directly in to the db. NOTE: This means that data loaded in this manner bypasses the cleaning steps included in the actual data flow through the data/ endpoints.