Skip to content

Latest commit

 

History

History
92 lines (62 loc) · 2.88 KB

README.md

File metadata and controls

92 lines (62 loc) · 2.88 KB

OSM POI data to GeoJSON to tagged ML training data

The purpose of this repository is to take OSM extracts and turn them into GeoJSON and then parse this data to produce tagged training data for machine learning with supervised address / POI parsing. Any OSM extract will work with the repo, but extracts that are larger than then ones used in step 1 can potentially cause memory issues for one or more of the parsing scripts. The repository is part of an effort to build an open-source end-to-end encrypted mapping app.

Install

Install Osmosis: brew install osmosis

Install OSMConvert: brew install osmconvert

Install OSMtoGeoJSON: npm install -g osmtogeojson

Parsing points-of-interest from OSM extracts

Following the general blueprint from this Medium article with additional parsing to get data from GeoJSON into ML usable training data stored as pickle files.

Steps

1. Download OSM extract

For the latest OSM extract in the beta testing regions run one of the following scripts:

Download a pbf extract of OSM data, e.g. this extract of Quebec from GeoFabrick, which we are using for Montreal.

California

wget https://download.geofabrik.de/north-america/us/california-latest.osm.pbf -P osm_extracts

Georgia

wget https://download.geofabrik.de/north-america/us/georgia-latest.osm.pbf -P osm_extracts

New York

wget https://download.geofabrik.de/north-america/us/new-york-latest.osm.pbf -P osm_extracts/new_york.osm.pbf

Quebec

wget https://download.geofabrik.de/north-america/canada/quebec-latest.osm.pbf -P osm_extracts

2a. Run OSM to (Geo)JSON parsing pipeline

python osm_to_json.py parseosm --region {REGION} --osm {BOOLEAN}

This is it, you are done.

2b. Select points of interest

However, alternatively you can run through steps 2b to 5 one by one:

bash sh/osm_pbf_to_nodes_osm.sh -r $REGION
Input Output
*.osm.pbf *.nodes.osm

3. Drop ways, keep nodes

bash sh/nodes_osm_to_poi_osm.sh -r $REGION
Input Output
*.nodes.pbf *.poi.osm

4. Convert to (Geo)JSON

bash sh/poi_osm_to_poi_geojson.sh -r $REGION
Input Output
*.poi.osm *.poi.geojson

5. Clean (Geo)JSOn and extract names, labels and coordinates

python osm_to_json.py parseosm --region {REGION} --osm False
Input Output
*.poi.geojson *.osm.text.tags.coords.pkl