-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
COUNTER Release 5 #6781
Comments
Hi @NateWr, @ctgraham, and @shanu17, I've opened this issue for us to see what everything has to be done for the COUNTER R5 support. I just started with a few things I have identified above, but the list is still to be filled. It would be great if we would also know what exactly is Pitt ULS working on, so that we can work on other things and arange. |
We will probably want to continue to track total views between different kinds of full text, because journals will want to know that. So we'll just need to make sure that we're counting appropriately for R5 while not losing some specificity we already have. |
👍 |
Hi all, above all @asmecher and @NateWr, but maybe @ctgraham (above all regarding COUNTER R5 rules) as well :-)
In the process of log file processing till the data in the DB tables: what do you think at which place I should check if the object with the ID exists? -- For the current log files this is surely not necessary, but if someone would like to reprocess some old files. I was thinking at the moment we load the data from the temp tables into the actual ones. Thanks a lot! |
@ctgraham, earlier we had administrative and/or user name logged, but I do not think this was considered in a way for the usage stats numbers and COUNTER. As far as I could see we do not need them now, we do not need to differentiate/consider/remove such access, correct? |
And maybe one more question @ctgraham: I think we do not need unique_title metric type, correct? -- We would/could consider books as submissions in OJS? |
Thanks @bozana, I've left some comments on the commit.
I think this is fine for now. Ideally, we would migrate this to use the new FileService and Jobs Queue to handle the staging and processing of files. We would probably benefit from breaking this down into several smaller jobs, but that can be done another time.
Is that really how the COUNTER spec works!? 😮 So if I view something at 7:59 and 8:01 these are considered unique, but not if I view it at 8:01 and 8:03? |
Thanks a lot @NateWr! |
If we had a way to exclude administrative usage counts from COUNTER statistics, we would be responsible to do so. If we could exclude counting access generated via the Issue Preview, this would be appropriate (but might not be readily done). In general, just the fact that a user was logged in should not be a consideration for COUNTER.
The unique_title metric shouldn't be relevant for OJS. Perhaps for OMP, if individual chapters can be presented?
Yes, the fuzzy definition of "per hour" is only relevant if the user session itself cannot be identified. |
I haven't thought much about OMP yet, but: A book/submission can just contain the files or it can contain chapters. So it could be a problem if we would have different Items (per COUNTER definition) within one press -- in the first case the book/submission and in the second chapters -- correct? So my first thought was to simplify all this and say the book/submission is the item, and the chapters would be seen as just files... 🤔 |
Here as well I tended to (over) simplify it and have just considered the hour slices 😅 So I should consider/log the user session, if there? |
Regarding the administrative access:
|
Agreed, and agreed. |
@asmecher, do I see/understand it correctly that our user sessions, depending on the setting in the config file, either 'never' expires (30 days) (and the session id is not changed) or with the browser session i.e. when the browser is closed? |
...so by my read, sessions unused for 1 hour become eligible for garbage collection, which is stochastic. These policies haven't been changed for a long time, and I suspect there are some best practices we could adopt. So I'm open to change on this. |
That sounds good to me -- I am just trying to figure our if we can rely on our session ID for usage stats... |
Hmmm... It seems that using those two settings is not reliable and we should implement the session timeout by ourselves, s. https://stackoverflow.com/questions/520237/how-do-i-expire-a-php-session-after-30-minutes. |
But, even then, if we implement to expire the user session after 30 minutes or 1 hour of inactivity: |
Thanks to suggestion from @NateWr I moved the double click and unique item processing i.e. removal to the database, doing in with the SQL -- all log entries will be inserted into the temporary tables and then the removal of double and unique clicks done there, s. |
@bozana, I think it should be possible to formulate a query that works for both MySQL and PostgreSQL using DELETE FROM xxx WHERE yyy IN (subquery) -- but you'd need to test it against both to be sure, as I remember seeing complaints about self-joins but don't recall the conditions. I have some PostgreSQL test datasets from various versions, and could either send you those, or test potential queries, whatever's most helpful. |
@ctgraham, just to be sure: do you think we should consider user session when possible or use only 24 slices? |
If we decide to use the session ID when possible, shall it expire after 1 hour of inactivity, or 1/2 hour? |
@asmecher, would this code be OK for the session expiration: bozana@e16bbfa, as said above? |
pkp/pkp-lib#6781 Opt-out for public SUSHI API
pkp/pkp-lib#6781 Opt-out for public SUSHI API
pkp/pkp-lib#6781 Opt-out for public SUSHI API
@ctgraham, the major functionality is in the main branch. It would be great and I would be very happy if you would like and have some time to take a look/test... but... no pressure, of course... :-) |
PR that considers the first date published of a context when calculating the SUSHI start date: |
#6781 consider first date published of a context for the S…
pkp/pkp-lib#6781 submodule update ##bozana/6781##
pkp/pkp-lib#6781 submodule update ##bozana/6781##
pkp/pkp-lib#6781 submodule update ##bozana/6781##
Implement the COUNTER Release 5 for OJS/OMP/OPS usage statistics.
Here we can collect everything we decide is necessary. We can have a discussion below and every time we decide something we can summarize it here.
It seems the Release 5 with lots of changes is out there.
Here a guide for journals: https://www.projectcounter.org/wp-content/uploads/2020/08/Module_2_Journal_Usage_20200811.pdf.
Processing rules for COUNTER 5 reports, s. https://www.projectcounter.org/code-of-practice-five-sections/7-processing-rules-underlying-counter-reporting-data/:
a) Double click filtering (s. section 7.2):
This is implemented here: https://github.com/pkp/usageStats/blob/master/UsageStatsLoader.inc.php#L194-L224. Till know the differentiation was between the access of HTML, PDF and other. This seems not to be needed any more -- We can change it to consider 30 seconds for any link i.e. file.
Also we should change our implementation so that only the same URLs are considered (and not the assocType + assocID as till now). The uniqueness is treated differently:
b) Unique Items (s. section 7.3):
In our case Item is an article. The matching report is AR1. And the rule is: "If multiple transactions qualifying for the Metric_Type in question represent the same item and occur in the same user-sessions, only one unique activity MUST be counted for that item." Where user-session seems to be defined for an hour, as far as I understand it.
The question if the article versions do belong to the same Item is still open. Due to the way we represent them internally I would say they do belong to the same Item.
c) Unique Titles (s. section 7.4):
In the case of a journal Title = a journal and the report = Title Master Report. Similar to the rule for the unique item above, the rule here is: "If multiple transactions qualifying for the Metric_Type in question represent the same title and occur in the same user-session only one unique activity MUST be counted for that title.". Where the user-session seems to be defined for an hour. I.e. here, if a user accesses one article and then another in the same session, it would only count once.
This rule i.e. report seems not to be used for single journals -- introduced mostly for books. Do we need it (e.g. for libraries and multi-journal installations)?
d) Internet Robots and Crawlers (s. section 7.8):
Same as for Release 4.
COUNTER maintains the current list of internet robots and crawlers at https://github.com/atmire/COUNTER-Robots.
We use it as module in lib/pkp/lib/counterBots, assign the file to the variable COUNTER_USER_AGENTS_FILE (https://github.com/pkp/pkp-lib/blob/master/classes/core/Core.inc.php#L23) and implement the function isUserAgentBot in https://github.com/pkp/pkp-lib/blob/master/classes/core/Core.inc.php#L100. The function is then used when the log files are processed (https://github.com/pkp/usageStats/blob/master/UsageStatsLoader.inc.php#L170).
We should define the strategy when we get the most recent version of the list.
Because R5 now supports/count abstract views (in total views count), shell we consider the galley view pages too?
SUSHI support is mandatory for compliance with COUNTER Release 5 (s. https://www.projectcounter.org/wp-content/uploads/2019/05/Release_5_TechNotes_PDFX_20190509-Revised.pdf).
What Reports we would need/like to support/provide: AR1, Journal Master Report, X?
The text was updated successfully, but these errors were encountered: