-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update log handling to ensure metrics are calculated correctly across versions #4904
Comments
@ctgraham if you have any thoughts on this, that'd be much appreciated! |
This may be a bit broad for this ticket, but I think it is strongly relevant.... James and I chatted with Paul Needham of IRUS-UK last month about some of the considerations in log processing. This includes:
There might be opportunities to collaborate with IRUS-UK to architect a common library which can process logs into statistics (or perhaps just perform the bot-exclusions). It also made we wonder why there wouldn't be a shared library for representing statistics ontop of COUNTER-SUSHI R5 (or why this wouldn't be outsourced to an Electronic Resource Management system), but that didn't seem to get much uptake in our conversation. I can definitely check in on what the interpretation of "double clicks" of versioned items is from a COUNTER perspective. |
To my surprise, I think we are ok-ish on the versioning front without any changes. However, there is one possible issue I uncovered while looking into it. From my investigations, here's how the URLs are parsed:
This works for PDF/HTML galleys because even though they load at Our versioned URLs look like However, this means that when someone directly visits the page of a PDF or HTML galley, without going through the article landing page, they record two entries in the logs:
This is not a new issue with versioning, but I thought I would double-check with you @asmecher and @ctgraham to see if this is intended behaviour. It seems to me like this should count as a single view of the file, not the submission. To test this I constructed the following fake log to parse, which includes only two URL hits. One to the URL to view a PDF (
(To use this for your own testing you would need to update the submission, galley and file ids to ones that exist in your system.) |
I think this is probably tied up with COUNTER, which I'm not very familiar with -- it has specifications for things like debouncing. And I'm not sure whether COUNTER business rules are applied when metrics are recorded, or when they're processed. @ctgraham, are you familiar with this aspect? If not, maybe I could follow up with Bozana. |
I don't think COUNTER cares about abstract views, nor has it in my memory. I reviewed COUNTER R5, R4, and R3 and each focuses on fulltext downloads/views. Back in R3 there was a distinction between fulltext HTML, fulltext PDF, and fulltext "other", but that distinction goes away in later releases. If a user views fulltext via HTML and then fulltext via PDF, or downloads the fulltext via the same medium twice, this only counts a one view. If I were more helpful I would check to make sure the COUNTER reports are using only ASSOC_TYPE_SUBMISSION_FILE for calculations (I think this is the case), check for legacy reports in OJS which describe abstract views separately from fulltext views (hopefully not), and would verify that the inline display of HTML fulltext is registered correctly (I think I recall fancy jiggering in the a plugin for this). But first I want to get that Crossref testing out the door, and even that simple task is eluding me right now. |
Thanks @ctgraham. I think this is not an urgent question to address for 3.2, because is not something newly introduced. I'll defer it from 3.2 for now but would like to keep this conversation open. I'm not sure if I understand you correctly, but perhaps there is some divergence here between how OJS keeps statistics and how you describe COUNTER expectations. First, OJS does make a distinction between abstract and file views. It tracks both separately in the metrics type by the Second, from my brief test, it appears that OJS is treating a single visit to the view PDF page as a single visit to the article abstract page. If COUNTER ignores abstract views entirely, usage may be under-reported. (This was an isolated test. More investigation would be necessary to see what happens in different scenarios and how the rows in a metrics table get compiled into COUNTER reports.) |
Maybe I can take a further look at the problems described here i.e. what is needed to be solve for the next release... OJS:
|
Hi @NateWr and @ctgraham, there would be some things to correct here: the problem that Nate found out (No. 1 and 3 above) should be fixed. The changes in the the log processing (No. 5b and 5c above) -- above all the immediate access on HTML and then PDF, or different article versions -- are related to the new COUNTER Release 5. So we will need to implement them when we implement the support for that R5. |
Hi @NateWr and @ctgraham, reading the document https://www.projectcounter.org/wp-content/uploads/2020/08/Module_2_Journal_Usage_20200811.pdf: it seems that that unique item and title (from the 5b and 5c above) first came now with the COUNTER Release 5. The Release 4, that we currently support, I think, still counts HTML and PDF separately. Also, the R5 considers abstract views as Note about the unique title from 5c above: |
Hmmm... This Release 5 seems to has been out there for almost 3 years now, so I suppose we should support it very soon... |
COUNTER R4 is still fairly widely used around Libraries, though it ought to be phased out in favor of R5; Pitt ULS uses COUNTER in communicating usage statistics to Plum Analytics. @shanu17 from Pitt ULS is working on SUSHI/COUNTER R5 for PKP. I owe him better definitions of each report so that he can map the report requirements against our Statistics Service / MetricsDAO. There remain some gaps in our internal statistics harvesting for non-OA usages, e.g. mapping access against institutional subscription and counting access denied requests. |
The coming PRs consider the following issues:
counts only the last, file download.
NOTE about the double click processing for versions:
Thus, I am not sure if this applies to the same URLs or content objects. (The R5 is more precise, I believe, and mentions only URLs, s. https://www.projectcounter.org/code-of-practice-five-sections/7-processing-rules-underlying-counter-reporting-data/#doubleclick. -- The uniqueness is handled extra.) Currently, that means for versioning:
are counted only 1. If file changes in a new version, e.g. the following log file entries
are counted = 2. This seems to be OK -- if the file does not change and the new version contains the same file (with the same file ID) it is considered for double click.
they are counted = 1. Is this all OK so for now? (For R5 we would then change the way double click processing works, to consider only the same URLs) |
Thanks a lot @NateWr! I would then merge the main branch changes as soon as all tests are successfully run, ok? I also did the PRs for stable-3_3_0 (see above). Those contain only the fixes for the problem No. 1 and No. 2 from here #4904 (comment). (No 3. requires DB change, so this is not coming into the stable branch). Also, I did not implement PKPSubmissionDAO::exists() -- as we said -- not to change it in the stable branch, but I added the PKPPublicationDAO::exists() -- because this check is new/first now added. |
pkp/pkp-lib#4904 fix usage stats
pkp/pkp-lib#4904 fix usage stats
pkp/pkp-lib#4904 fix usage stats
pkp/pkp-lib#4904 fix usage stats
Looks good, go ahead and merge to stable. |
pkp/pkp-lib#4904 fix usage stats
pkp/pkp-lib#4904 fix usage stats
pkp/pkp-lib#4904 fix usage stats
pkp/pkp-lib#4904 fix usage stats
Everything merged from this issue, thus closing... |
The code which reads access logs and stores metrics data needs to be updated to ensure stats are calculated correctly across different versions.
The main URLs for a submission should stay the same. However, new URLs will be introduced for each version and its galleys (see #4870). Visits to these URLs should go toward a submission's total.
Also, COUNTER may have some rules regarding counting duplicate visits within a time period to the same resource. We need to figure out what these rules specify and how to correctly count visits to two versions of the same item in a short period.
The text was updated successfully, but these errors were encountered: