You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our sources have a first_story field on them. We should fill that in regularly searching for the oldest (reasonable) date on each source. Smart to re-run every 6 months or so because we sometimes ingest historical data. Easy to implement if we can run a query for each source for dates from 2000 till now and sort by pub_date ASC. Perhaps don't do this for sources that have less than 100 articles because that's an indicator that we don't get regular data from them? A simple data science script so I'm logging it here, but probably best implemented as a Django management command like update-stories-per-week.
The text was updated successfully, but these errors were encountered:
Our sources have a
first_story
field on them. We should fill that in regularly searching for the oldest (reasonable) date on each source. Smart to re-run every 6 months or so because we sometimes ingest historical data. Easy to implement if we can run a query for each source for dates from 2000 till now and sort by pub_date ASC. Perhaps don't do this for sources that have less than 100 articles because that's an indicator that we don't get regular data from them? A simple data science script so I'm logging it here, but probably best implemented as a Django management command like update-stories-per-week.The text was updated successfully, but these errors were encountered: