Brief overview of steps 1 and 2 in meting_mate/ingest
Call _1_crawl_drive.py
.
This script loops through all app users and uses the google docs API to locate their own and shared google docs documents. Whenever a new or modified document is discovered, a check against the database is run. If the db timestamp deviates from the Gdocs timestamp (or if the doc is missing altogether), the document is upserted.
Brief overview of steps 1 and 2 in meting_mate/ingest
Call _1_crawl_drive.py
.
This script loops through all app users and uses the google docs API to locate their own and shared google docs documents. Whenever a new or modified document is discovered, a check against the database is run. If the db timestamp deviates from the Gdocs timestamp (or if the doc is missing altogether), the document is upserted.
Call _2_get_contents.py
.
This script locates all documents documents with no content from the "docs" collection, then proceeds to download the document contents in multiple formats - native JSON as returned by the google docs API, as well as a MS Word .docx export. The word format is then converted to HTML and Markdown using Mammoth.