-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bioRxiv PDFs not being shown #426
Comments
Looking at prereview/src/backend/utils/resolve.ts Lines 203 to 256 in c12bc98
I've tried this preprint locally, and the bioRxiv page fails due to hitting a Cloudflare CAPTCHA. The Crossref API resolves, but it doesn't seem to contain information about the PDF. |
I can replicate seeing the CAPTCHA with |
The Crossref API entry is https://api.crossref.org/works/10.1101/2021.11.10.468081: it doesn't contain information about the PDF/HTML views, nor means to access them. Likewise, the bioRxiv API doesn't provide the details (https://api.biorxiv.org/details/biorxiv/10.1101/2021.11.10.468081). It does, however, return a link to the JATS XML version, which in turn has a broken link to the PDF. |
I've just tried about the Google Scholar integration and it can find links to PDFs... but there's presumably a lag with indexing and so this particular preprint isn't yet available. |
I'm also not sure if PREreview refreshes its local data. (e.g. What happens with a bioRxiv preprint that changes its name in a later version?) |
Looks like Sciety had the same bioRxiv problem (sciety/sciety#1200). |
EuropePMC doesn't have the information available: https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=DOI:10.1101/2021.11.10.468081&resultType=core&format=json even saying it's not open access. The EuropePMC page (https://europepmc.org/article/PPR/PPR419421) does have the PDF Link, but saying it's come from Unpaywall. |
The Unpaywall API does have the PDF link: https://api.unpaywall.org/v2/10.1101/[email protected] Looking at recent bioRxiv articles (i.e. ones first published yesterday/today), there is a bit of a lag between being published and appearing on Unpaywall (I think there's a delay between it being published and it being available on the Crossref API, and then in turn on the Unpaywall API). |
The sample article is now appearing on Google Scholar, but only the Europe PMC entry which doesn't produce a PDF link. |
Options I can see: One or more of:
Or:
|
I say 4 is the quickest option. Do you know what is the delay? And would this option allow for (re)fetching the pdf later when available if it was first imported when not available? |
Hard to know, and probably changes per article. Right now (10.40 UTC 17 November 2021), the most recent article I can find available all happened on the 16 November (times all UTC):
So that case a magnitude of hours. But for the next article published:
and is not yet available on the Unpaywall API (https://api.unpaywall.org/v2/10.1101/[email protected] returning a not found error). From Sciety's experience, we know in some cases data doesn't appear in the Crossref API for quite a while (sciety/sciety#1199, sciety/sciety#664). With this very limited data, I'd say between hours and days. (So quicker than Google Scholar, which IIRC can take weeks.) The only guaranteed way to be able to get the information is from the bioRxiv page itself, which currently is not allowed to be read by machines.
I need to dig more into the code, but I suspect there isn't any re-fetching of information. If that is the case, it's possibly more valuable to add this first while asking bioRxiv for assistance. |
Looks like requesting the bioRxiv page is working again, so PDFs are appearing. https://prereview.org/preprints/doi-10.1101-2021.11.10.468081 is still showing just the abstract, so there is work to do in re-fetching information. |
Yay! Thanks @thewilkybarkid! |
bioRxiv preprints (such as https://prereview.org/preprints/doi-10.1101-2021.11.10.468081) are showing the abstract text rather than the PDF, even though the license allows its use.
The text was updated successfully, but these errors were encountered: