Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate rss-fetcher returning non-news URLs #44

Open
philbudne opened this issue Aug 16, 2024 · 2 comments
Open

Investigate rss-fetcher returning non-news URLs #44

philbudne opened this issue Aug 16, 2024 · 2 comments

Comments

@philbudne
Copy link
Contributor

rss-fetcher output includes URLs that story-indexer regards as "non-news", both simple domain names (archive.org) and subdomains (xyz.iheart.com):

2024-08-16 18:17:28,180 c9a6a33e93c1 rss-puller INFO: non-news: http://archive.org/details/dlibra.bibliotekaelblaska.pl.92649-2.30732645
2024-08-16 18:17:26,732 c9a6a33e93c1 rss-puller INFO: non-news: https://kentuckynewsnetwork.iheart.com/content/2024-08-16-18-year-old-teen-cowboy-ace-patton-ashford-killed-in-freak-accident/
2024-08-16 18:17:24,066 c9a6a33e93c1 rss-puller INFO: non-news: https://knrs.iheart.com/content/2024-08-16-new-poll-shows-where-harris-trump-stand-in-crucial-swing-state/
2024-08-16 18:17:23,563 c9a6a33e93c1 rss-puller INFO: non-news: https://buckeyecountry105.iheart.com/content/2024-08-16-new-poll-shows-where-harris-trump-stand-in-crucial-swing-state/
2024-08-16 18:17:19,856 c9a6a33e93c1 rss-puller INFO: non-news: https://wgy.iheart.com/content/2024-08-16-boebert-bikini-photo-supporting-colleague-reveals-massive-secret-tattoo/
@philbudne
Copy link
Contributor Author

story-indexer has a non_news_fqdn function for this.
mediacloud/metadata-lib#91 is a request to move that to mc_metadata

@philbudne
Copy link
Contributor Author

Code is in mediacloud/metadata-lib#93

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant