Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automate updating of "NNNN - State & Local" collections? #2

Open
rahulbot opened this issue Nov 6, 2024 · 0 comments
Open

automate updating of "NNNN - State & Local" collections? #2

rahulbot opened this issue Nov 6, 2024 · 0 comments

Comments

@rahulbot
Copy link

rahulbot commented Nov 6, 2024

When we seeded the global country collections years ago we created a "NNNN - State & Local" collection for each country. That was created by combining all the specific province/state collections together. That collection isn't dynamic, so changes to state collections aren't propagated into the combined "State & Local" one, making them slowly worse over time.

A quick fix would be to create a script that automatically updates the "State & Local" collection for each country by creating a union of all that country's state-level collections. This could run once a week and would improve the existing situation significantly. Perhaps this should live in web-search eventually? Moved from mediacloud/web-search#600. Adding here to track as a data science task related to directory maintenance.

A sketch of it (relying on our naming conventions) could look like this:

for each country:
    find the "- National" collection
    find the "- State & Local" collection
    find all the other state collections
    list all the sources in the state collections
        (bonus: email any sources in more than one state collection to someone)
    add all those sources to the "- State & Local"
        (bonus: if any were added new, post a slack msg with the country and list of domains added)
    update the note on the collection to indicate when it was last updated and that it was automated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant