Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document new usage stats implementation #8014

Closed
1 of 8 tasks
bozana opened this issue Jun 21, 2022 · 13 comments
Closed
1 of 8 tasks

Document new usage stats implementation #8014

bozana opened this issue Jun 21, 2022 · 13 comments
Assignees
Labels
Housekeeping:2:Urgent Any dependency management or refactor that must be done soon to fix or avoid a significant problem.
Milestone

Comments

@bozana
Copy link
Collaborator

bozana commented Jun 21, 2022

Document the new usage stats implementation:

@bozana bozana self-assigned this Jun 21, 2022
@bozana bozana added the Housekeeping:2:Urgent Any dependency management or refactor that must be done soon to fix or avoid a significant problem. label Jun 21, 2022
@bozana bozana added this to the 3.4 milestone Jun 21, 2022
@bozana bozana moved this to Backlog in Statistics Jun 21, 2022
@NateWr NateWr moved this from Backlog to Todo in Statistics Jun 23, 2022
@bozana
Copy link
Collaborator Author

bozana commented Nov 7, 2022

PR swagger docu:
pkp/ojs#3610
pkp/omp#1243

@bozana
Copy link
Collaborator Author

bozana commented Nov 8, 2022

@NateWr, could you please take a look at the PRs?
For OMP and OPS we do not have API documentation, there is only the OMP swagger-source.json file, right? Shell I create the swagger file for OPS?

@NateWr
Copy link
Contributor

NateWr commented Nov 16, 2022

I didn't even remember that we had a swagger-source.json file for OMP. It must be really out of date now because I've never kept it updated. I don't think we should keep duplicate copies of this documentation, so I'd recommend we remove it from OMP and don't add it to OPS.

bozana added a commit to bozana/ojs that referenced this issue Nov 17, 2022
bozana added a commit to bozana/ojs that referenced this issue Nov 17, 2022
bozana added a commit to bozana/omp that referenced this issue Nov 17, 2022
@bozana
Copy link
Collaborator Author

bozana commented Nov 17, 2022

@NateWr, thanks a lot for the review!!! I have considered all your comments, also removed the file from OMP. Would you like to double check it? Else, I would merge, once the tests are passed... Thanks a lot!!!

@NateWr
Copy link
Contributor

NateWr commented Nov 17, 2022

I'm happy for you to merge when you're ready. 👍

bozana added a commit to pkp/ojs that referenced this issue Nov 18, 2022
withanage pushed a commit to withanage/ojs that referenced this issue Dec 14, 2022
withanage pushed a commit to withanage/ojs that referenced this issue Dec 14, 2022
@bozana
Copy link
Collaborator Author

bozana commented Feb 14, 2023

@NateWr, this is the issue that contains the links to the documentation I've done so far -- for you if you start working on it and I am not there...

@NateWr
Copy link
Contributor

NateWr commented Mar 1, 2023

@bozana I've put some time into the documentation for stats. I looked through all of your documentation (it's great!) and incorporated much of it into drafts of three documents:

Draft Description
Admin Guide > Statistics Everything a system administrator needs to know to configure stats and reprocess log files
Documentation > Statistics Everything a contributor or plugin developer needs to know about how stats are logged, processed, and compiled into metrics.
Release Notebook 3.4 A section in the release notebook about what to know for upgrades.

I still need to update the API documentation and include the database descriptions in the DB schema. But the bulk of what you've put together should be in these three drafts. Let me know if I have missed anything important.

I had a couple of questions:

  1. Where are institutional stats stored? And how are they accessed? I don't seem to have a metrics table for them, and I can't find an API endpoint to download stats.
  2. What exactly does an upgrader need to do before they run an upgrade to 3.4? At one point, it says that stats for the current day should be processed before the upgrade. But I believe the processor only runs up to (but not including) the current day. Can we include step-by-step instructions for someone upgrading? (I'd like to roll this into the how to upgrade guide, as well as the release notebook.)
  3. In the dev documentation, there is a brief section on the stats service classes. I'm not very familiar with the new methods. Can you share some simple examples like in the existing code that will illustrate how to use the different methods (like getTotal, getTimeline, etc)?
  4. Finally, not related to documentation, but should the IP geographic DB be downloaded if geographic stats are turned off?

Thanks again, all of this documentation was really helpful!

@bozana
Copy link
Collaborator Author

bozana commented Mar 8, 2023

Hi @NateWr, thanks a lot!!!

Institutional stats are stored in metrics_counter_submission_institution_daily and metrics_counter_submission_institution_monthly. Currently they are only used in COUNTER reports accessed/provided via COUNTER SUSHI API. Later we will implement the possibility to download those reports. But, we currently do not plan any other, e.g. internal reporting for institutions.

Before running the upgrade:

  • ensure all (except the current) log files are successfully processed
  • if needed download reports that are removed in 3.4: PKP Usage statistics report, View Report, Custom Report Generator, and R3 reports in COUNTER plugin

After the upgrade:

  • if Geo data was collected earlier, fix the old region codes, i.e. run the following two commands/migration scripts from your OJS, OMP and OPS folder:
    php lib/pkp/migration.php "PKP\migration\upgrade\v3_4_0\I6782_RegionMappings" up
    php lib/pkp/migration.php "PKP\migration\upgrade\v3_4_0\I6782_FixRegionCodes" up

See this PR: pkp/pkp-docs#1029

Hmmm... Currently it is apparently always downloaded...

And there is a mistake regarding the chapter COUNTER R5 SUSHI in the admin guide:
Going to Statistics > Reports > COUNTER Report the user can access only R4.1 reports.
R5 reports are currently only available via SUSHI API in JSON format.

@NateWr
Copy link
Contributor

NateWr commented Mar 8, 2023

Thanks @bozana!

ensure all (except the current) log files are successfully processed

Can you explain what this means? What exact CLI commands or point-and-click steps should someone take before they upgrade? Is the "current log file" today's log file? What happens to that file that doesn't happen to other files?

@bozana
Copy link
Collaborator Author

bozana commented Mar 12, 2023

In normal case it means that the scheduled task UsageStatsLoader has been already run that day, so that the log file from the previous day is processed. This can also extra be done by:

  • removing that row in the DB table scheduled_tasks: delete from scheduled task where class_name = 'plugins.generic.usageStats.UsageStatsLoader'
  • accessing the journal home page (which will then run that scheduled task)

In the case when there were some (older) processing errors, i.e. when the folder usageStats/processing, usageStats/stage or usageStats/reject contain some log files, the problems should be solved, e.g. by:

  • taking a look in the scheduledTaskLogs/Usagestatisticsfileloadertask... log (to see what was the problem)
  • fixing the problem
  • copying the files to the stage folder
  • running the scheduled task again.

Yes, with 'current log file' I mean the today's log file.

The upgrade takes care of the today's log file -- the entries in that log file are converted to the new format, so that usage can be continued to be logged in that file.

@NateWr
Copy link
Contributor

NateWr commented Mar 13, 2023

Ok, then a couple questions. First, let's assume that the journal is running correctly -- meaning that logs are being processed daily without any errors. In that case:

  1. Is this a step that should be done with every upgrade? If so, should we add it to the How to Upgrade document?
  2. If this is not something that should be done every time, what would be the extra cost in upgrade time if we converted today's and yesterday's log files during upgrade? If we already have the code to convert one log file, it'd be great if we just converted both so that upgraders didn't have to do anything. Here's what I'm thinking:
  • a) Add a pre-flight check that will fail if it finds log files for days other than today or yesterday. The pre-flight check can say "hey it looks like your stats aren't being processed daily, you need to handle this before you upgrade."
  • b) During upgrade it converts today's and yesterday's logs to the new format.

This way, 95% of system administrators have to do nothing. And it's only when there's a pre-existing problem of some kind that they need to change things up.

@bozana
Copy link
Collaborator Author

bozana commented Mar 13, 2023

Hi @NateWr, yes that sounds perfect :-) 👍
This is not something that should be done every time.

@NateWr
Copy link
Contributor

NateWr commented Mar 29, 2023

@bozana with #8844 completed, I think this is done. I'll close this for now, but let me know if I've missed anything.

@NateWr NateWr closed this as completed Mar 29, 2023
@github-project-automation github-project-automation bot moved this from Todo to Done in Statistics Mar 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Housekeeping:2:Urgent Any dependency management or refactor that must be done soon to fix or avoid a significant problem.
Projects
Status: Done
Development

No branches or pull requests

3 participants