You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A recent thought: If the stories table had a "last_seen" column (updated each time the URL is found in a feed), we could use it to prevent aging out entries from unchanging feeds (would need to compare story.last_seen to the last time new/different content was returned (http_last_modified?).
This would increase database write load, but would prevent duplicates generated every time a URL from a static feed is expired from the stories table.
Since stories can come from multiple feeds, last_seen should probably go into a table that relates a stories table entry to a feeds table entry.
After attempting to insert a story into the stories table, an attempt should be made to create or update a feed_stories table entry with the current time (same as used in creating the story).
If the feed document hash has not changed, update the last_seen time for all stories seen from that feed.
After the above processing, delete any feed_stories entries that weren't updated, and, if possible have the delete cascade to the stories table if this was the last reference to the story.
On second thought, the cascading delete could effect the end-user experience if the feed document is empty, we would not not have any "recent" stories to show.
Currently old stories are pruned by date, so entries from slow/static feeds time out, and "new" articles keep on being discovered.
The
fetch_events
table is pruned to a fixed number of entries, doing the same for thestories
table might avoid the rediscovery problem.The text was updated successfully, but these errors were encountered: