Skip to content

CKAN commands

Aaron D Borden edited this page Nov 30, 2019 · 18 revisions

Documents common CKAN commands that are used for both inventory.data.gov and catalog.data.gov.

System administrator accounts

To create a system administrator account for CKAN, the user must first exist in the database by first logging into CKAN (through MAX.gov). Users must request access by following the Account Procedures first. This command should be run from one of the harvesters, e.g. catalog-harvester2p.

$ sudo ckan sysadmin add <email-address>

Remove the sysadmin status.

$ sudo ckan sysadmin remove <email-address>

Catalog commands

Other commands

Note, these commands have not been tested.

ckan --plugin=ckanext-geodatagov geodatagov harvest-job-cleanup

Harvest jobs can get stuck at Running state and stay that way forever. This will reset them and fix any harvest object issues they cause.

ckan --plugin=ckanext-qa qa update_sel

Start QA analysis on all datasets whose 'last modified timestamp' is >= timestamp embedded in the following file: /var/log/qa-metadata-modified.log

ckan --plugin=ckanext-qa qa collect-ids && ckan --plugin=ckanext-qa qa update

Compare to qa update_sel, this qa update will run analysis on ALL datasets. It will take loooooooong to finish.

ckan --plugin=ckanext-geodatagov geodatagov clean-deleted

CKAN keeps deleted package in the DB. This clean command makes sure they are really gone.

ckan tracking update

This needs to be run periodically in order to run analysis on raw data and generate summarized page view tracking data that ckan/solr can use.

ckan --plugin=ckanext-report report generate

This generates /report/broken-links page showing broken link statistics for dataset resources by organization.

ckan --plugin=ckanext-geodatagov geodatagov db_solr_sync

Over time solr can get out of sync from db due to all kind of glitches. This brings them back in sync.

ckan --plugin=ckanext-spatial ckan-pycsw set_keywords -p /etc/ckan/pycsw-collection.cfg*

This grabs top 20 tags from CKAN and put them into /etc/ckan/pycsw-collection.cfg as CSW service metadata keywords.

ckan --plugin=ckanext-spatial ckan-pycsw set_keywords -p /etc/ckan/pycsw-all.cfg

This grabs top 20 tags from ckan and put them into /etc/ckan/pycsw-all.cfg as CSW service metadata keywords.

ckan --plugin=ckanext-spatial ckan-pycsw load -p /etc/ckan/pycsw-all.cfg

Accesses CKAN api to load CKAN datasets into pycsw database.

/usr/lib/ckan/bin/python /usr/lib/ckan/bin/pycsw-db-admin.py vacuumdb /etc/ckan/pycsw-all.cfg

Does vacuumdb job on pycsw database.

/usr/lib/ckan/bin/python /usr/lib/ckan/bin/pycsw-db-admin.py reindex_fts /etc/ckan/pycsw-all.cfg

Rebuilds GIN index on pycsw records table to speed up full text search.

ckan --plugin=ckanext-geodatagov geodatagov combine-feeds

This gathers 20 pages of CKAN feeds from /feeds/dataset.atom and generates /usasearch-custom-feed.xml to feed USAsearch. USAsearch uses Bing index as backend which does not understand pagination in atom feeds.

ckan --plugin=ckanext-geodatagov geodatagov export-csv

This keeps records of all datasets that are tagged with Topic and Topic Categories, and generates /csv/topic_datasets.csv

Clone this wiki locally