COUNTER Release 5 #6781

bozana · 2021-02-22T15:57:33Z

Implement the COUNTER Release 5 for OJS/OMP/OPS usage statistics.
Here we can collect everything we decide is necessary. We can have a discussion below and every time we decide something we can summarize it here.

It seems the Release 5 with lots of changes is out there.
Here a guide for journals: https://www.projectcounter.org/wp-content/uploads/2020/08/Module_2_Journal_Usage_20200811.pdf.

Processing rules for COUNTER 5 reports, s. https://www.projectcounter.org/code-of-practice-five-sections/7-processing-rules-underlying-counter-reporting-data/:
a) Double click filtering (s. section 7.2):
This is implemented here: https://github.com/pkp/usageStats/blob/master/UsageStatsLoader.inc.php#L194-L224. Till know the differentiation was between the access of HTML, PDF and other. This seems not to be needed any more -- We can change it to consider 30 seconds for any link i.e. file.
Also we should change our implementation so that only the same URLs are considered (and not the assocType + assocID as till now). The uniqueness is treated differently:
b) Unique Items (s. section 7.3):
In our case Item is an article. The matching report is AR1. And the rule is: "If multiple transactions qualifying for the Metric_Type in question represent the same item and occur in the same user-sessions, only one unique activity MUST be counted for that item." Where user-session seems to be defined for an hour, as far as I understand it.
The question if the article versions do belong to the same Item is still open. Due to the way we represent them internally I would say they do belong to the same Item.
c) Unique Titles (s. section 7.4):
In the case of a journal Title = a journal and the report = Title Master Report. Similar to the rule for the unique item above, the rule here is: "If multiple transactions qualifying for the Metric_Type in question represent the same title and occur in the same user-session only one unique activity MUST be counted for that title.". Where the user-session seems to be defined for an hour. I.e. here, if a user accesses one article and then another in the same session, it would only count once.
This rule i.e. report seems not to be used for single journals -- introduced mostly for books. Do we need it (e.g. for libraries and multi-journal installations)?
d) Internet Robots and Crawlers (s. section 7.8):
Same as for Release 4.
COUNTER maintains the current list of internet robots and crawlers at https://github.com/atmire/COUNTER-Robots.
We use it as module in lib/pkp/lib/counterBots, assign the file to the variable COUNTER_USER_AGENTS_FILE (https://github.com/pkp/pkp-lib/blob/master/classes/core/Core.inc.php#L23) and implement the function isUserAgentBot in https://github.com/pkp/pkp-lib/blob/master/classes/core/Core.inc.php#L100. The function is then used when the log files are processed (https://github.com/pkp/usageStats/blob/master/UsageStatsLoader.inc.php#L170).
We should define the strategy when we get the most recent version of the list.
Because R5 now supports/count abstract views (in total views count), shell we consider the galley view pages too?
SUSHI support is mandatory for compliance with COUNTER Release 5 (s. https://www.projectcounter.org/wp-content/uploads/2019/05/Release_5_TechNotes_PDFX_20190509-Revised.pdf).
What Reports we would need/like to support/provide: AR1, Journal Master Report, X?

bozana · 2021-02-22T16:13:03Z

Hi @NateWr, @ctgraham, and @shanu17, I've opened this issue for us to see what everything has to be done for the COUNTER R5 support. I just started with a few things I have identified above, but the list is still to be filled. It would be great if we would also know what exactly is Pitt ULS working on, so that we can work on other things and arange.
Closely related to these changes for R5 would be some improvements discussed here: #6782.

NateWr · 2021-02-22T16:52:53Z

Till know the differentiation was between the access of HTML, PDF and other. This seems not to be needed any more -- We can change it to consider 30 seconds for any link i.e. file.

We will probably want to continue to track total views between different kinds of full text, because journals will want to know that. So we'll just need to make sure that we're counting appropriately for R5 while not losing some specificity we already have.

bozana · 2021-02-22T17:31:12Z

Till know the differentiation was between the access of HTML, PDF and other. This seems not to be needed any more -- We can change it to consider 30 seconds for any link i.e. file.

We will probably want to continue to track total views between different kinds of full text, because journals will want to know that. So we'll just need to make sure that we're counting appropriately for R5 while not losing some specificity we already have.

👍
(The above is about double-click processing, which was different in R4 and now it is the same -- 30 seconds -- for any files)

bozana · 2021-07-12T17:52:40Z

Hi all, above all @asmecher and @NateWr, but maybe @ctgraham (above all regarding COUNTER R5 rules) as well :-)
I implemented the major part of the new UsageStatsLoader (the function processFile()), that considers the COUNTER R5. Would it be possible for you to take a look at it, if you would have better ideas, suggestions,...
Here the short summary:

the old logic is kept:
-- extends FileLoader (that is still only used by/for usage stats)
-- moving log files through the directories: usageEventLogs -> stage -> processing -> archive or reject
-- read line by line,
-- using temporary tables to store the log entries that counts (after double click and unique item removals). the temp DB tables are good structure to move the summarized data to the actual tables. can you think of some other structure (e.g. just PHP arrays), that is clean and with better performance?
COUNTER R5 (s. https://www.projectcounter.org/code-of-practice-five-sections/7-processing-rules-underlying-counter-reporting-data/ and https://www.projectcounter.org/code-of-practice-five-sections/7-processing-rules-underlying-counter-reporting-data/):
-- user identification done by IP and userAgent
-- double clicks: when the same user clicks the same URL within 30 seconds
-- unique item: the day is sliced in 24 pieces --> when the same user views/uses the same submission (either abstract or files) within an hour
Here is the new UsageStatsLoader: https://github.com/bozana/pkp-lib/blob/6782/classes/task/UsageStatsLoader.inc.php

In the process of log file processing till the data in the DB tables: what do you think at which place I should check if the object with the ID exists? -- For the current log files this is surely not necessary, but if someone would like to reprocess some old files. I was thinking at the moment we load the data from the temp tables into the actual ones.

Thanks a lot!

bozana · 2021-07-12T17:54:50Z

@ctgraham, earlier we had administrative and/or user name logged, but I do not think this was considered in a way for the usage stats numbers and COUNTER. As far as I could see we do not need them now, we do not need to differentiate/consider/remove such access, correct?

bozana · 2021-07-12T17:58:34Z

And maybe one more question @ctgraham: I think we do not need unique_title metric type, correct? -- We would/could consider books as submissions in OJS?

NateWr · 2021-07-13T16:49:41Z

Thanks @bozana, I've left some comments on the commit.

extends FileLoader (that is still only used by/for usage stats)

I think this is fine for now. Ideally, we would migrate this to use the new FileService and Jobs Queue to handle the staging and processing of files. We would probably benefit from breaking this down into several smaller jobs, but that can be done another time.

unique item: the day is sliced in 24 pieces

Is that really how the COUNTER spec works!? 😮 So if I view something at 7:59 and 8:01 these are considered unique, but not if I view it at 8:01 and 8:03?

bozana · 2021-07-14T08:45:01Z

Thanks a lot @NateWr!
Yes, we would need to adapt scheduling from Laravel, also the jobs queue, but I agree to do it then, when everything else is done...
Yes, what you say about uniqueness is true, and maybe @ctgraham can confirm?
Actually the uniqueness is connected with/based on one user session, but if such does not exists (e.g. if the user is not logged in), than that way, s. https://www.projectcounter.org/code-of-practice-five-sections/7-processing-rules-underlying-counter-reporting-data/.

ctgraham · 2021-07-15T18:25:31Z

earlier we had administrative and/or user name logged, but I do not think this was considered in a way for the usage stats numbers and COUNTER.

If we had a way to exclude administrative usage counts from COUNTER statistics, we would be responsible to do so. If we could exclude counting access generated via the Issue Preview, this would be appropriate (but might not be readily done). In general, just the fact that a user was logged in should not be a consideration for COUNTER.

I think we do not need unique_title metric type, correct? -- We would/could consider books as submissions in OJS?

The unique_title metric shouldn't be relevant for OJS. Perhaps for OMP, if individual chapters can be presented?

Actually the uniqueness is connected with/based on one user session.

Yes, the fuzzy definition of "per hour" is only relevant if the user session itself cannot be identified.

bozana · 2021-07-16T09:25:51Z

I haven't thought much about OMP yet, but: A book/submission can just contain the files or it can contain chapters. So it could be a problem if we would have different Items (per COUNTER definition) within one press -- in the first case the book/submission and in the second chapters -- correct? So my first thought was to simplify all this and say the book/submission is the item, and the chapters would be seen as just files... 🤔

bozana · 2021-07-16T09:30:44Z

Yes, the fuzzy definition of "per hour" is only relevant if the user session itself cannot be identified.

Here as well I tended to (over) simplify it and have just considered the hour slices 😅 So I should consider/log the user session, if there?
I will see how long do our sessions last...
Somehow I do not like this COUNTER 'rule' neither -- different systems can have differently lasting sessions... :-P

bozana · 2021-07-16T09:44:05Z

Regarding the administrative access:

we can check if the user is admin or editor or so, but this does not necessarily mean the access is administrative i.e. we maybe should not do that, right?
the issue preview uses the same function 'view' but we could fire the usage event only when the object (issue or submission) is published, ok?

ctgraham · 2021-07-16T13:02:43Z

Regarding the administrative access:

* we can check if the user is admin or editor or so, but this does not necessarily mean the access is administrative i.e. we maybe should not do that, right?

* the issue preview uses the same function 'view' but we could fire the usage event only when the object (issue or submission) is published, ok?

Agreed, and agreed.

bozana · 2021-07-16T13:39:38Z

@asmecher, do I see/understand it correctly that our user sessions, depending on the setting in the config file, either 'never' expires (30 days) (and the session id is not changed) or with the browser session i.e. when the browser is closed?

asmecher · 2021-07-16T15:56:55Z

SessionManager.inc.php contains:

ini_set('session.cookie_lifetime', 0);
...
ini_set('session.gc_maxlifetime', 60 * 60);

...so by my read, sessions unused for 1 hour become eligible for garbage collection, which is stochastic.

These policies haven't been changed for a long time, and I suspect there are some best practices we could adopt. So I'm open to change on this.

bozana · 2021-07-19T09:44:39Z

That sounds good to me -- I am just trying to figure our if we can rely on our session ID for usage stats...
For some reason I am always logged in (with the same session ID), also after 1 hour (and setting 0 in the config and deactivating 'remember me') of not using the site... Do the other experience the same?

bozana · 2021-07-19T10:41:54Z

Hmmm... It seems that using those two settings is not reliable and we should implement the session timeout by ourselves, s. https://stackoverflow.com/questions/520237/how-do-i-expire-a-php-session-after-30-minutes.
So maybe to have that check, if the last usage is long time ago, here, before these lines: https://github.com/pkp/pkp-lib/blob/main/classes/session/SessionManager.inc.php#L109-L110. Maybe somewhere else too?

bozana · 2021-07-19T10:45:43Z

But, even then, if we implement to expire the user session after 30 minutes or 1 hour of inactivity:
For COUNTER usage stats:
If a logged-in user uses the journal site for the whole day, it would mean only 1 unique submission access, differently to the other counts when users are not logged-in and we use the 24 day slices.
Somehow I tend to always use those 24 slices for usage stats...
@ctgraham and @NateWr, what do you think?

bozana · 2021-07-19T12:58:20Z

Thanks to suggestion from @NateWr I moved the double click and unique item processing i.e. removal to the database, doing in with the SQL -- all log entries will be inserted into the temporary tables and then the removal of double and unique clicks done there, s.
https://github.com/bozana/pkp-lib/blob/6782/classes/statistics/UsageStatsTotalTemporaryRecordDAO.inc.php#L95
and
https://github.com/bozana/pkp-lib/blob/6782/classes/statistics/UsageStatsUniqueTemporaryRecordDAO.inc.php#L94
Now the processing in the UsageStatsLoader is slim, s. https://github.com/bozana/pkp-lib/blob/6782/classes/task/UsageStatsLoader.inc.php#L87.
Maybe @asmecher and @jonasraoni could have a look at that SQLs too?

asmecher · 2021-07-19T20:27:25Z

@bozana, I think it should be possible to formulate a query that works for both MySQL and PostgreSQL using DELETE FROM xxx WHERE yyy IN (subquery) -- but you'd need to test it against both to be sure, as I remember seeing complaints about self-joins but don't recall the conditions. I have some PostgreSQL test datasets from various versions, and could either send you those, or test potential queries, whatever's most helpful.

bozana · 2021-07-20T09:46:56Z

@ctgraham, just to be sure: do you think we should consider user session when possible or use only 24 slices?

bozana · 2021-07-20T12:32:26Z

If we decide to use the session ID when possible, shall it expire after 1 hour of inactivity, or 1/2 hour?

bozana · 2021-07-20T12:34:15Z

@asmecher, would this code be OK for the session expiration: bozana@e16bbfa, as said above?

#6781 Opt-out for public SUSHI API

pkp/pkp-lib#6781 Opt-out for public SUSHI API

bozana · 2022-09-19T11:18:53Z

@ctgraham, the major functionality is in the main branch. It would be great and I would be very happy if you would like and have some time to take a look/test... but... no pressure, of course... :-)

…rt date

bozana · 2022-11-01T12:08:53Z

PR that considers the first date published of a context when calculating the SUSHI start date:
pkp-lib: #8390
ojs: pkp/ojs#3605 (only submodule update)
omp: pkp/omp#1240 (only submodule update)
ops: pkp/ops#386 (only submodule update)

…rt date

#6781 consider first date published of a context for the S…

pkp/pkp-lib#6781 submodule update ##bozana/6781##

bozana added the Enhancement:1:Minor A new feature or improvement that can be implemented in less than 3 days. label Feb 22, 2021

bozana added this to the OJS/OMP/OPS 3.4 milestone Feb 22, 2021

bozana mentioned this issue Feb 23, 2021

Update log handling to ensure metrics are calculated correctly across versions #4904

Closed

bozana self-assigned this May 5, 2021

bozana mentioned this issue May 18, 2021

IP location and institution service #6895

Closed

2 tasks

bozana mentioned this issue Jun 10, 2021

Improve usage statistics handling in the background/code #6782

Closed

35 tasks

bozana added a commit to bozana/pkp-lib that referenced this issue Sep 12, 2022

pkp#6781 consider review comments

2d46cf4

bozana added a commit to bozana/ojs that referenced this issue Sep 12, 2022

pkp/pkp-lib#6781 add UserRolesRequiredPolicy to all API handlers

2452827

bozana added a commit to bozana/ojs that referenced this issue Sep 12, 2022

pkp/pkp-lib#6781 public SUSHI API and opt-out possibility

9b6cf06

bozana added a commit to bozana/ojs that referenced this issue Sep 12, 2022

pkp/pkp-lib#6781 pkp-lib submodule update ##bozana/6781##

fbf6b0e

bozana added a commit to bozana/omp that referenced this issue Sep 12, 2022

pkp/pkp-lib#6781 pkp-lib submodule update ##bozana/6781##

56b6c81

bozana added a commit to bozana/ops that referenced this issue Sep 12, 2022

pkp/pkp-lib#6781 pkp-lib submodule update ##bozana/6781##

073f847

bozana mentioned this issue Sep 12, 2022

Tab Separated Values reporting for COUNTER R5 #8248

Closed

bozana added a commit that referenced this issue Sep 12, 2022

Merge pull request #8214 from bozana/6781

abeec4b

#6781 Opt-out for public SUSHI API

bozana added a commit to pkp/ojs that referenced this issue Sep 12, 2022

Merge pull request #3517 from bozana/6781

13a7ef8

pkp/pkp-lib#6781 Opt-out for public SUSHI API

bozana added a commit to pkp/ops that referenced this issue Sep 12, 2022

Merge pull request #338 from bozana/6781

07ae916

pkp/pkp-lib#6781 Opt-out for public SUSHI API

bozana added a commit to pkp/omp that referenced this issue Sep 12, 2022

Merge pull request #1187 from bozana/6781

cf4cd40

pkp/pkp-lib#6781 Opt-out for public SUSHI API

bozana closed this as completed Sep 12, 2022

Repository owner moved this from Under Development to Done in Statistics Sep 12, 2022

bozana added a commit to bozana/pkp-lib that referenced this issue Nov 1, 2022

pkp#6781 consider first date published of a context for the SUSHI sta…

4abccce

…rt date

bozana mentioned this issue Nov 1, 2022

pkp/pkp-lib#6781 consider first date published of a context for the S… #8390

Merged

bozana added a commit to bozana/pkp-lib that referenced this issue Nov 3, 2022

pkp#6781 consider first date published of a context for the SUSHI sta…

0b6eec6

…rt date

bozana added a commit to bozana/omp that referenced this issue Nov 3, 2022

pkp/pkp-lib#6781 submodule update ##bozana/6781##

e2f50d5

bozana mentioned this issue Nov 3, 2022

pkp/pkp-lib#6781 submodule update ##bozana/6781## pkp/omp#1240

Merged

bozana added a commit to bozana/ops that referenced this issue Nov 3, 2022

pkp/pkp-lib#6781 submodule update ##bozana/6781##

fe522c7

bozana mentioned this issue Nov 3, 2022

pkp/pkp-lib#6781 submodule update ##bozana/6781## pkp/ops#386

Merged

bozana added a commit to bozana/ojs that referenced this issue Nov 3, 2022

pkp/pkp-lib#6781 submodule update ##bozana/6781##

c4d2136

bozana mentioned this issue Nov 3, 2022

pkp/pkp-lib#6781 submodule update ##bozana/6781## pkp/ojs#3605

Merged

bozana added a commit that referenced this issue Nov 3, 2022

Merge pull request #8390 from bozana/6781

3176f96

#6781 consider first date published of a context for the S…

bozana added a commit to pkp/omp that referenced this issue Nov 3, 2022

Merge pull request #1240 from bozana/6781

fe90455

pkp/pkp-lib#6781 submodule update ##bozana/6781##

bozana added a commit to pkp/ops that referenced this issue Nov 3, 2022

Merge pull request #386 from bozana/6781

c495fed

pkp/pkp-lib#6781 submodule update ##bozana/6781##

bozana added a commit to pkp/ojs that referenced this issue Nov 3, 2022

Merge pull request #3605 from bozana/6781

669e1af

pkp/pkp-lib#6781 submodule update ##bozana/6781##

withanage pushed a commit to withanage/ojs that referenced this issue Dec 14, 2022

pkp/pkp-lib#6781 submodule update ##bozana/6781##

b2397d9

asmecher added this to PKP Public Roadmap Jul 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COUNTER Release 5 #6781

COUNTER Release 5 #6781

bozana commented Feb 22, 2021 •

edited

Loading

bozana commented Feb 22, 2021

NateWr commented Feb 22, 2021

bozana commented Feb 22, 2021 •

edited

Loading

bozana commented Jul 12, 2021 •

edited

Loading

bozana commented Jul 12, 2021

bozana commented Jul 12, 2021

NateWr commented Jul 13, 2021

bozana commented Jul 14, 2021

ctgraham commented Jul 15, 2021

bozana commented Jul 16, 2021

bozana commented Jul 16, 2021 •

edited

Loading

bozana commented Jul 16, 2021 •

edited

Loading

ctgraham commented Jul 16, 2021

bozana commented Jul 16, 2021 •

edited

Loading

asmecher commented Jul 16, 2021

bozana commented Jul 19, 2021

bozana commented Jul 19, 2021 •

edited

Loading

bozana commented Jul 19, 2021 •

edited

Loading

bozana commented Jul 19, 2021

asmecher commented Jul 19, 2021

bozana commented Jul 20, 2021

bozana commented Jul 20, 2021

bozana commented Jul 20, 2021

bozana commented Sep 19, 2022

bozana commented Nov 1, 2022 •

edited

Loading

COUNTER Release 5 #6781

COUNTER Release 5 #6781

Comments

bozana commented Feb 22, 2021 • edited Loading

bozana commented Feb 22, 2021

NateWr commented Feb 22, 2021

bozana commented Feb 22, 2021 • edited Loading

bozana commented Jul 12, 2021 • edited Loading

bozana commented Jul 12, 2021

bozana commented Jul 12, 2021

NateWr commented Jul 13, 2021

bozana commented Jul 14, 2021

ctgraham commented Jul 15, 2021

bozana commented Jul 16, 2021

bozana commented Jul 16, 2021 • edited Loading

bozana commented Jul 16, 2021 • edited Loading

ctgraham commented Jul 16, 2021

bozana commented Jul 16, 2021 • edited Loading

asmecher commented Jul 16, 2021

bozana commented Jul 19, 2021

bozana commented Jul 19, 2021 • edited Loading

bozana commented Jul 19, 2021 • edited Loading

bozana commented Jul 19, 2021

asmecher commented Jul 19, 2021

bozana commented Jul 20, 2021

bozana commented Jul 20, 2021

bozana commented Jul 20, 2021

bozana commented Sep 19, 2022

bozana commented Nov 1, 2022 • edited Loading

bozana commented Feb 22, 2021 •

edited

Loading

bozana commented Feb 22, 2021 •

edited

Loading

bozana commented Jul 12, 2021 •

edited

Loading

bozana commented Jul 16, 2021 •

edited

Loading

bozana commented Jul 16, 2021 •

edited

Loading

bozana commented Jul 16, 2021 •

edited

Loading

bozana commented Jul 19, 2021 •

edited

Loading

bozana commented Jul 19, 2021 •

edited

Loading

bozana commented Nov 1, 2022 •

edited

Loading