Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor sync status logging #3529

Merged
merged 17 commits into from
Jan 22, 2025
Merged

Refactor sync status logging #3529

merged 17 commits into from
Jan 22, 2025

Conversation

eemeli
Copy link
Member

@eemeli eemeli commented Jan 16, 2025

See also #2023

As is, the logging of sync status is indexed primarily by a sync "run" that touches all projects, secondarily by project, and tertiarily by repository. The status for each project is also gathered implicitly into three states: in progress, skipped, and synced. This no longer matches how sync is actually run, and is hard to handle in practice.

Here, sync status logging is collapsed into a single table of events, which is written to only from sync_project_task(), with the following states:

  • IN_PROGRESS
  • DONE - sync complete, with some changes to DB or repo
  • NO_CHANGES - sync complete, with no changed to DB or repo
  • NO_COMMIT - sync complete, but commit=False was set while a repo write should've occurred
  • PREV_BUSY - skipped due to previous sync still running
  • FAIL - an exception was raised
  • INCOMPLETE - sync never completed; set by a subsequent successful run

For FAIL, an error message is retained in the error field.

The logs are made available via the following views:

  • /sync - latest sync status and start time for each project
  • /sync/projects/<project> - sync events for each project
  • /sync/errors - all anomalous sync events with a status other than DONE or NO_CHANGES

Old sync events are ported to the new format, setting the status as NO_CHANGES for all that were "skipped" and INCOMPLETE for ones with no end time set.


Some questions for potential futher changes:

  • Are there additional sync states that should be explicitly identified?
  • Is there additional information that should be retained in the sync log?
  • Are there other views into the data that should be presented?
  • The per-project and anomaly listings show the error message, if one is available. Should access to that be limited to some subset of users?

Edit: Updated to match the PR as of #3529 (comment).

@eemeli eemeli requested review from mathjazz and flodolo January 16, 2025 14:18
@mathjazz
Copy link
Collaborator

Deployed to stage:
https://mozilla-pontoon-staging.herokuapp.com/sync/

Migration 0004 is slow (took about 10 minutes).

no_commit - commit=False was set, preventing a repo write

Is this one always no_commit, even if running, fails, done with changes or not? If yes, I'd probably track this information separate from the sync job state.

/sync/errors - all anomalous sync events with a status other than done or no_changes

Even in_progress?

Are there additional sync states that should be explicitly identified?

Not state per se, but since we treat --no-commit as state here, we should also track --force. Not necessarily as state, though.

Is there additional information that should be retained in the sync log?

These would be great (taken from #2023), can be done in a follow-up:

  • What entities/locales were synced
  • For each entity, what action did the handle_entity end up taking?
  • What resources were added or removed?

Are there other views into the data that should be presented?

Need to have a closer look...

The per-project and anomaly listings show the error message, if one is available. Should access to that be limited to some subset of users?

Probably. To admins. Maybe all /sync pages?

@mathjazz
Copy link
Collaborator

/sync/errors - all anomalous sync events with a status other than done or no_changes

Could this be /sync/log/errors?

Old sync events are ported to the new format, setting the status as fail for all that were "skipped".

Why not no_changes?

@mathjazz
Copy link
Collaborator

mathjazz commented Jan 16, 2025

Early observations:

  1. Thanks for working on this. This PR makes things much simpler, but still adds features. Nice work!
  2. The project sync view (e.g.) would be nice to be linked to from the project dashboard. Could be a new tab?
  3. If you add Last duration to https://mozilla-pontoon-staging.herokuapp.com/sync/log/, I don't think I'll miss https://pontoon.mozilla.org/sync/log/ anymore.

@eemeli
Copy link
Member Author

eemeli commented Jan 16, 2025

Migration 0004 is slow (took about 10 minutes).

It was maybe a minute for me locally, and I didn't spend much time optimizing it -- the data needs to be read from the project & repo sync logs & be rewritten in the new format.

Easiest way to speed it up would be to e.g. only port the last week or so of events?

no_commit - commit=False was set, preventing a repo write

Is this one always no_commit, even if running, fails, done with changes or not? If yes, I'd probably track this information separate from the sync job state.

It should only be written if the status would be done otherwise. Probably ought to limit it further to only getting picked if there would be any repo changes.

/sync/errors - all anomalous sync events with a status other than done or no_changes

Even in_progress?

Yes. That's anomalous if e.g. there's a crash mid-sync and we don't get to write a fail.

Are there additional sync states that should be explicitly identified?

Not state per se, but since we treat --no-commit as state here, we should also track --force. Not necessarily as state, though.

Yeah, that and the sync source (automated vs. manual CLI vs. admin UI) should be tracked.

Is there additional information that should be retained in the sync log?

These would be great (taken from #2023), can be done in a follow-up:

  • What entities/locales were synced
  • For each entity, what action did the handle_entity end up taking?
  • What resources were added or removed?

Yeah, that should be a follow-up.

The per-project and anomaly listings show the error message, if one is available. Should access to that be limited to some subset of users?

Probably. To admins. Maybe all /sync pages?

The overview should be pretty safe to keep public, yes? It doesn't include error logs.

/sync/errors - all anomalous sync events with a status other than done or no_changes

Could this be /sync/log/errors?

That would conflict with a project using errors as its slug, though.

Old sync events are ported to the new format, setting the status as fail for all that were "skipped".

Why not no_changes?

Because we can't tell the difference, both are logged the same way. But I should probably set it to some other value instead.

The project sync view (e.g.) would be nice to be linked to from the project dashboard. Could be a new tab?

Potentially, yes.

If you add Last duration to https://mozilla-pontoon-staging.herokuapp.com/sync/log/, I don't think I'll miss https://pontoon.mozilla.org/sync/log/ anymore.

I shall take that under advisement.

@eemeli
Copy link
Member Author

eemeli commented Jan 16, 2025

Made the following changes:

  • Limited no_commit to only when repo changes would otherwise have been made.
  • Require can_manage_project permission for project & error logs
  • Migrate previous "skipped" syncs to status=99, which gets rendered as ??? (99)
  • Added a "Duration" column to the top-level listing
  • By default, hide "no changes" syncs in top-level listing, with toggle for showing them

@mathjazz
Copy link
Collaborator

Limited no_commit to only when repo changes would otherwise have been made.

I wouldn't mix sync type (regular, no commit, no pull, force) with status.

Migrate previous "skipped" syncs to status=99, which gets rendered as ??? (99)

I don't think it's obvious what this means. I suggested no_changes, because it would be accurate for a vast majority of cases.

By default, hide "no changes" syncs in top-level listing, with toggle for showing them

I'm not sure I like that to be the default, because I regularly check https://pontoon.mozilla.org/sync/log/ before deployment to see if all sync jobs are done. 😊 But maybe we can have an overall "sync in-progress" indicator in the header?

Should we also remove the DB projects from the list?

@eemeli
Copy link
Member Author

eemeli commented Jan 16, 2025

Limited no_commit to only when repo changes would otherwise have been made.

I wouldn't mix sync type (regular, no commit, no pull, force) with status.

I agree about the others, but no_commit affects the status depending on whether there are changes to write in the repo.

Migrate previous "skipped" syncs to status=99, which gets rendered as ??? (99)

I don't think it's obvious what this means. I suggested no_changes, because it would be accurate for a vast majority of cases.

Eh, this is only a transition issue, and given the log cycling, only valid for a few weeks after this is merged. After that, you don't need to remember what status=99 means because it'll be gone.

By default, hide "no changes" syncs in top-level listing, with toggle for showing them

I'm not sure I like that to be the default, because I regularly check https://pontoon.mozilla.org/sync/log/ before deployment to see if all sync jobs are done. 😊 But maybe we can have an overall "sync in-progress" indicator in the header?

In-progress syncs show up in the listing as "in progress", so that functionality is there already.

Should we also remove the DB projects from the list?

I'd prefer to keep them in, as it effectively communicates the truth about them not getting sync'd.

@mathjazz
Copy link
Collaborator

I agree about the others, but no_commit affects the status depending on whether there are changes to write in the repo.

Fair.

Migrate previous "skipped" syncs to status=99, which gets rendered as ??? (99)

I don't think it's obvious what this means. I suggested no_changes, because it would be accurate for a vast majority of cases.

Eh, this is only a transition issue, and given the log cycling, only valid for a few weeks after this is merged. After that, you don't need to remember what status=99 means because it'll be gone.

That's a reason not to special case it. :) 3rd party users will practically need to look into the code to understand it.

In-progress syncs show up in the listing as "in progress", so that functionality is there already.

True.

Should we also remove the DB projects from the list?

I'd prefer to keep them in, as it effectively communicates the truth about them not getting sync'd.

Makes sense. I would make some CSS changes before merging if you don't mind, to bring the styling closer to the lists / tables on dashboard, e.g. https://pontoon.mozilla.org/projects/.

@mathjazz
Copy link
Collaborator

@eemeli
Copy link
Member Author

eemeli commented Jan 17, 2025

Eh, this is only a transition issue, and given the log cycling, only valid for a few weeks after this is merged. After that, you don't need to remember what status=99 means because it'll be gone.

That's a reason not to special case it. :) 3rd party users will practically need to look into the code to understand it.

Fine, will mark them as NO_CHANGES. It'll hide some failures, but they were already hidden.

[...] I would make some CSS changes before merging if you don't mind, to bring the styling closer to the lists / tables on dashboard, e.g. https://pontoon.mozilla.org/projects/.

Very happy for you to do so; please commit directly on the branch.

  • /sync/log: Change to /sync/?
  • /sync/log/<project>: Change to /sync/projects/<project> to be consistent with dashboards, admin, translate?

Ok. Will include a /sync/log -> /sync/ redirect.

It has .order_by("-start_time"), but I don't understand why that doesn't seem to work?

@eemeli
Copy link
Member Author

eemeli commented Jan 17, 2025

Updated as above. Also added INCOMPLETE as a new status, applied to any IN_PROGRESS syncs for the current project when a sync successfully completes, an during the migration.

I'll also update the PR description to match.

It has .order_by("-start_time"), but I don't understand why that doesn't seem to work?

Gah, I'd used m rather than i for the date() formatting string in the templates, so while they were correctly ordered, the "minutes" were rendering months instead.

@codecov-commenter
Copy link

codecov-commenter commented Jan 17, 2025

Codecov Report

Attention: Patch coverage is 60.19900% with 80 lines in your changes missing coverage. Please review.

Project coverage is 78.89%. Comparing base (89f390a) to head (c7a88c6).
Report is 3 commits behind head on main.

Additional details and impacted files

@mathjazz
Copy link
Collaborator

Deployed to stage.

@flodolo Could you please take it for a spin?

@flodolo
Copy link
Collaborator

flodolo commented Jan 17, 2025

/sync/project/XXX is accessible from the main sync page. Should /sync/errors be accessible as well, maybe as a link in the main page?

One nitpick is that I'm having a hard time reading the table, because of how distant the columns are (e.g. short project name is very far from the status). Any thought against having a row border for readability?

Screenshot 2025-01-17 alle 14 00 53

mypy.ini Outdated Show resolved Hide resolved
@mathjazz
Copy link
Collaborator

/sync/project/XXX is accessible from the main sync page. Should /sync/errors be accessible as well, maybe as a link in the main page?

One nitpick is that I'm having a hard time reading the table, because of how distant the columns are (e.g. short project name is very far from the status). Any thought against having a row border for readability?

I can take a stab at both of these as part of the CSS changes I'll push to the branch.

@mathjazz mathjazz requested a review from flodolo January 21, 2025 20:30
@mathjazz mathjazz merged commit 729cc71 into mozilla:main Jan 22, 2025
7 checks passed
@eemeli eemeli deleted the refactor-sync-log branch January 22, 2025 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants