Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updater pipeline and initial work on the breadth-first update #374

Merged
merged 30 commits into from
Dec 13, 2023

Conversation

renatav
Copy link
Collaborator

@renatav renatav commented Dec 6, 2023

Description (e.g. "Related to ...", etc.)

Refactored the updater by implementing a pipeline. The biggest advantage of this approach is that this pipeline is a class, meaning that it maintains a state attribute and that it is not necessary to pass a large number of variables to various functions. Also split the process into a bigger number of separate steps. The idea is to greatly improve the readability of the code. Also improved error messages and logging.

Partly implemented breadth-first update. The authentication repository will be validated first (and validation will fail if it is only valid up to a certain commit), but target repositories will then be validated in a breath-first manner. So, iterate over authentication repository's commit and for each commit, validate each of the target repositories. If an error is detected, partially update all repositories (so up to the last valid authentication commit and target commits specified in that commit of the authentication repository)

Closes #366
Closes #364
Closes #152
See #281

Code review checklist (for code reviewer to complete)

  • Pull request represents a single change (i.e. not fixing disparate/unrelated things in a single PR)
  • Title summarizes what is changing
  • Commit messages are meaningful (see this for details)
  • Tests have been included and/or updated, as appropriate
  • Docstrings have been included and/or updated, as appropriate
  • Changelog has been updated, as needed (see CHANGELOG.md)

- At the start of the update process, an error message used to be shown
saying that the default branch could not be determined. There is not need
to attempt to determine the default branch if the repository is not yet cloned.
- During the update process, the updater clones the remote reposiory to
a temp folder and this temp folder's name used to be used when logging.
E.g. 327fshkewrew/law would get set as the repository's name. In order
to be able to reference a repository by a different name when logging,
added a new attribute called alias. So now we can reference this temp
repository by a more descriptive name.
…ture

1. Introduced a `Pipeline` framework to structure the update process as a series of distinct stages.
2. Implemented the following pipeline stages:
   - `clone_remote_and_run_tuf_updater`
   - `validate_out_of_band_and_update_type`
   - `clone_or_fetch_users_auth_repo`
   - `load_target_repositories`
   - `get_targets_data_from_auth_repo`
@renatav renatav changed the title Renatav/bf update Updater pipeline and initial work on the breadth-first update Dec 7, 2023
@renatav renatav self-assigned this Dec 7, 2023
@renatav renatav marked this pull request as ready for review December 7, 2023 03:02
@renatav renatav requested a review from n-dusan December 7, 2023 03:02
Copy link
Contributor

@n-dusan n-dusan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the pipeline steps and the codebase looks a lot more readable! I've left some comments that I've found when testing.

CHANGELOG.md Outdated
@@ -1045,7 +1035,6 @@ and this project adheres to [Semantic Versioning][semver].
[0.13.0]: https://github.com/openlawlibrary/taf/compare/v0.12.0...v0.13.0
[0.12.0]: https://github.com/openlawlibrary/taf/compare/v0.11.2...v0.12.0
[0.11.1]: https://github.com/openlawlibrary/taf/compare/v0.11.1...v0.11.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[0.11.1]: https://github.com/openlawlibrary/taf/compare/v0.11.1...v0.11.2
[0.11.1]: https://github.com/openlawlibrary/taf/compare/v0.11.0...v0.11.1

@log_on_start(
INFO, "Cloning target repositories which are not on disk...", logger=taf_logger
)
@log_on_start(INFO, "Finished cloning target repositories", logger=taf_logger)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be?

Suggested change
@log_on_start(INFO, "Finished cloning target repositories", logger=taf_logger)
@log_on_end(INFO, "Finished cloning target repositories", logger=taf_logger)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Comment on lines 343 to 347
if self.only_validate:
taf_logger.warning(
"Target repositories must already exist when only validating repositories"
)
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran taf repo validate --path . on a repository which doesn't have a target repository. I got this error:

Traceback (most recent call last):
  File "d:\oll\taf\taf\updater\updater.py", line 581, in validate_repository
    _update_named_repository(
  File "d:\oll\taf\taf\updater\updater.py", line 366, in _update_named_repository
    ) = _update_current_repository(
  File "d:\oll\taf\taf\updater\updater.py", line 532, in _update_current_repository
    updater_pipeline.run()
  File "d:\oll\taf\taf\updater\updater_pipeline.py", line 107, in run
    self.handle_error(e)
  File "d:\oll\taf\taf\updater\updater_pipeline.py", line 118, in handle_error
    raise e
  File "d:\oll\taf\taf\updater\updater_pipeline.py", line 101, in run
    update_status = step()
  File "C:\Users\nikol\Envs\oll\lib\site-packages\logdecorator\decorator.py", line 19, in wrapper
    return self.execute(fn, *args, **kwargs)
  File "C:\Users\nikol\Envs\oll\lib\site-packages\logdecorator\decorator.py", line 89, in execute
    return super().execute(fn, *args, **kwargs)
  File "C:\Users\nikol\Envs\oll\lib\site-packages\logdecorator\decorator.py", line 13, in execute
    return fn(*args, **kwargs)
  File "d:\oll\taf\taf\updater\updater_pipeline.py", line 486, in get_target_repositories_commits
    repository.fetch(branch=branch)
  File "d:\oll\taf\taf\git.py", line 930, in fetch
    self._git("fetch {} {}", remote, branch, log_error=True)
  File "d:\oll\taf\taf\git.py", line 226, in _git
    raise error
taf.exceptions.GitError: Repo tmchippewa/law-static-assets: D:\OLL\oll-test-repos\tmchippewa\law-static-assets does not exist or is not a git repository

So it seems to be failing at repository.fetch(branch=branch) on L486 (different pipeline step).
I think we could either turn this warning into an error and handle the error in this pipeline step, or remove if self.only_validate altogether since error is raised in another pipeline step.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when validating, we should not proceed if the repository does not exit.

Comment on lines +493 to +495
fetched_commits = repository.all_commits_on_branch(
branch=f"origin/{branch}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to trigger an error but didn't get a stacktrace when the error occured, so I put breakpoint to figure out where the issue popped up. Looks like it's in all_commits_on_branch. Current updater output:

An error occurred while running step get_target_repositories_commits: 'NoneType' object has no attribute 'target'


Update of tmchippewa/law failed due to error: 'NoneType' object has no attribute 'target'

Could we add error handling in this pipeline step?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you remember which repository you were updating when this happened? This is old code that was not updated as a part of this rework

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sent via slack

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, the problem was caused by the incorrect fetching logic

repository.name
].items():
last_validated_commit = validated_commits[-1]
# TODO what to do if an error occurred while validating that branch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we create issues for leftover TODOs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this TODO and I think that the check if the update was fully successful is sufficient. If this part of the code is reached, there were no error and the branch was validated successfully

Comment on lines +789 to +794
for (
branch,
validated_commits,
) in self.state.validated_commits_per_target_repos_branches[
repository.name
].items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly validation is done BFS, but merging is DFS. Is it possible that during DFS merge a target repository merge fails (because of some conflict)? If so, will the next time we run updater, the target repository state affect target validation? Since one target repo would have multiple merged commits but other targets wouldn't.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not modify the merge logic. If, for any reason, the state of one or more target repositories is not synced with the authentication repository's state, the validation should start from scratch:

def determine_start_commits(self):

@log_on_start(
INFO, "Validating out of band commit and update type", logger=taf_logger
)
def _validate_out_of_band_and_update_type(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran taf repo update --url https://github.com/openlawlibrary/open-law to test out clone for cloning dependencies as well. It mostly worked, but got the following errors at the end. Here's the interesting output:

Validating out of band commit and update type


Update of test/test failed. One or more referenced authentication repositories could not be validated:
 'NoneType' object is not subscriptable
'NoneType' object is not subscriptable
'NoneType' object is not subscriptable
'NoneType' object is not subscriptable
'NoneType' object is not subscriptable
'NoneType' object is not subscriptable
'NoneType' object is not subscriptable


Called LifecycleStage.REPO handler. Event: failed


Called LifecycleStage.UPDATE handler. Event: failed


Update of test/test failed due to error: Update of test/test failed. One or more referenced authentication repositories could not be validated:
 'NoneType' object is not subscriptable
'NoneType' object is not subscriptable
'NoneType' object is not subscriptable
'NoneType' object is not subscriptable
'NoneType' object is not subscriptable
'NoneType' object is not subscriptable
'NoneType' object is not subscriptable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@renatav renatav requested a review from n-dusan December 13, 2023 19:30
@renatav renatav merged commit b3b6b4d into master Dec 13, 2023
25 checks passed
@renatav renatav deleted the renatav/bf-update branch December 13, 2023 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Impelemet updater pipeline Improve error logging Improve updater error messages
2 participants