Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune old stories by count instead of age #25

Open
philbudne opened this issue Oct 15, 2023 · 3 comments
Open

Prune old stories by count instead of age #25

philbudne opened this issue Oct 15, 2023 · 3 comments
Labels
enhancement New feature or request prio-medium

Comments

@philbudne
Copy link
Contributor

Currently old stories are pruned by date, so entries from slow/static feeds time out, and "new" articles keep on being discovered.

The fetch_events table is pruned to a fixed number of entries, doing the same for the stories table might avoid the rediscovery problem.

@philbudne philbudne added the enhancement New feature or request label Oct 15, 2023
@philbudne
Copy link
Contributor Author

A recent thought: If the stories table had a "last_seen" column (updated each time the URL is found in a feed), we could use it to prevent aging out entries from unchanging feeds (would need to compare story.last_seen to the last time new/different content was returned (http_last_modified?).

This would increase database write load, but would prevent duplicates generated every time a URL from a static feed is expired from the stories table.

@philbudne
Copy link
Contributor Author

Latest thinking:

Since stories can come from multiple feeds, last_seen should probably go into a table that relates a stories table entry to a feeds table entry.

After attempting to insert a story into the stories table, an attempt should be made to create or update a feed_stories table entry with the current time (same as used in creating the story).

If the feed document hash has not changed, update the last_seen time for all stories seen from that feed.

After the above processing, delete any feed_stories entries that weren't updated, and, if possible have the delete cascade to the stories table if this was the last reference to the story.

@philbudne
Copy link
Contributor Author

On second thought, the cascading delete could effect the end-user experience if the feed document is empty, we would not not have any "recent" stories to show.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request prio-medium
Projects
None yet
Development

No branches or pull requests

2 participants