Updater pipeline and initial work on the breadth-first update #374

renatav · 2023-12-06T16:23:03Z

Description (e.g. "Related to ...", etc.)

Refactored the updater by implementing a pipeline. The biggest advantage of this approach is that this pipeline is a class, meaning that it maintains a state attribute and that it is not necessary to pass a large number of variables to various functions. Also split the process into a bigger number of separate steps. The idea is to greatly improve the readability of the code. Also improved error messages and logging.

Partly implemented breadth-first update. The authentication repository will be validated first (and validation will fail if it is only valid up to a certain commit), but target repositories will then be validated in a breath-first manner. So, iterate over authentication repository's commit and for each commit, validate each of the target repositories. If an error is detected, partially update all repositories (so up to the last valid authentication commit and target commits specified in that commit of the authentication repository)

Closes #366
Closes #364
Closes #152
See #281

Code review checklist (for code reviewer to complete)

Pull request represents a single change (i.e. not fixing disparate/unrelated things in a single PR)
Title summarizes what is changing
Commit messages are meaningful (see this for details)
Tests have been included and/or updated, as appropriate
Docstrings have been included and/or updated, as appropriate
Changelog has been updated, as needed (see CHANGELOG.md)

- At the start of the update process, an error message used to be shown saying that the default branch could not be determined. There is not need to attempt to determine the default branch if the repository is not yet cloned. - During the update process, the updater clones the remote reposiory to a temp folder and this temp folder's name used to be used when logging. E.g. 327fshkewrew/law would get set as the repository's name. In order to be able to reference a repository by a different name when logging, added a new attribute called alias. So now we can reference this temp repository by a more descriptive name.

… appropriate error

…ture 1. Introduced a `Pipeline` framework to structure the update process as a series of distinct stages. 2. Implemented the following pipeline stages: - `clone_remote_and_run_tuf_updater` - `validate_out_of_band_and_update_type` - `clone_or_fetch_users_auth_repo` - `load_target_repositories` - `get_targets_data_from_auth_repo`

…ng to pipeline

…Improved logging

…atino of setting targets_data

…commits and pipeline fixes

…nitial state is out-of-sync

…e generic

… does

n-dusan

I really like the pipeline steps and the codebase looks a lot more readable! I've left some comments that I've found when testing.

n-dusan · 2023-12-07T10:30:37Z

CHANGELOG.md

@@ -1045,7 +1035,6 @@ and this project adheres to [Semantic Versioning][semver].
 [0.13.0]: https://github.com/openlawlibrary/taf/compare/v0.12.0...v0.13.0
 [0.12.0]: https://github.com/openlawlibrary/taf/compare/v0.11.2...v0.12.0
 [0.11.1]: https://github.com/openlawlibrary/taf/compare/v0.11.1...v0.11.2


Suggested change

[0.11.1]: https://github.com/openlawlibrary/taf/compare/v0.11.1...v0.11.2

[0.11.1]: https://github.com/openlawlibrary/taf/compare/v0.11.0...v0.11.1

n-dusan · 2023-12-07T10:59:11Z

taf/updater/updater_pipeline.py

+    @log_on_start(
+        INFO, "Cloning target repositories which are not on disk...", logger=taf_logger
+    )
+    @log_on_start(INFO, "Finished cloning target repositories", logger=taf_logger)


Should this be?

Suggested change

@log_on_start(INFO, "Finished cloning target repositories", logger=taf_logger)

@log_on_end(INFO, "Finished cloning target repositories", logger=taf_logger)

n-dusan · 2023-12-07T11:07:23Z

taf/updater/updater_pipeline.py

+                    if self.only_validate:
+                        taf_logger.warning(
+                            "Target repositories must already exist when only validating repositories"
+                        )
+                        continue


I ran taf repo validate --path . on a repository which doesn't have a target repository. I got this error:

Traceback (most recent call last): File "d:\oll\taf\taf\updater\updater.py", line 581, in validate_repository _update_named_repository( File "d:\oll\taf\taf\updater\updater.py", line 366, in _update_named_repository ) = _update_current_repository( File "d:\oll\taf\taf\updater\updater.py", line 532, in _update_current_repository updater_pipeline.run() File "d:\oll\taf\taf\updater\updater_pipeline.py", line 107, in run self.handle_error(e) File "d:\oll\taf\taf\updater\updater_pipeline.py", line 118, in handle_error raise e File "d:\oll\taf\taf\updater\updater_pipeline.py", line 101, in run update_status = step() File "C:\Users\nikol\Envs\oll\lib\site-packages\logdecorator\decorator.py", line 19, in wrapper return self.execute(fn, *args, **kwargs) File "C:\Users\nikol\Envs\oll\lib\site-packages\logdecorator\decorator.py", line 89, in execute return super().execute(fn, *args, **kwargs) File "C:\Users\nikol\Envs\oll\lib\site-packages\logdecorator\decorator.py", line 13, in execute return fn(*args, **kwargs) File "d:\oll\taf\taf\updater\updater_pipeline.py", line 486, in get_target_repositories_commits repository.fetch(branch=branch) File "d:\oll\taf\taf\git.py", line 930, in fetch self._git("fetch {} {}", remote, branch, log_error=True) File "d:\oll\taf\taf\git.py", line 226, in _git raise error taf.exceptions.GitError: Repo tmchippewa/law-static-assets: D:\OLL\oll-test-repos\tmchippewa\law-static-assets does not exist or is not a git repository

So it seems to be failing at repository.fetch(branch=branch) on L486 (different pipeline step).
I think we could either turn this warning into an error and handle the error in this pipeline step, or remove if self.only_validate altogether since error is raised in another pipeline step.

Yes, when validating, we should not proceed if the repository does not exit.

n-dusan · 2023-12-07T11:37:34Z

taf/updater/updater_pipeline.py

+                        fetched_commits = repository.all_commits_on_branch(
+                            branch=f"origin/{branch}"
+                        )


I managed to trigger an error but didn't get a stacktrace when the error occured, so I put breakpoint to figure out where the issue popped up. Looks like it's in all_commits_on_branch. Current updater output:

An error occurred while running step get_target_repositories_commits: 'NoneType' object has no attribute 'target' Update of tmchippewa/law failed due to error: 'NoneType' object has no attribute 'target'

Could we add error handling in this pipeline step?

Do you remember which repository you were updating when this happened? This is old code that was not updated as a part of this rework

Sent via slack

Fixed, the problem was caused by the incorrect fetching logic

n-dusan · 2023-12-07T17:49:57Z

taf/updater/updater_pipeline.py

+                    repository.name
+                ].items():
+                    last_validated_commit = validated_commits[-1]
+                    # TODO what to do if an error occurred while validating that branch


Could we create issues for leftover TODOs?

I thought about this TODO and I think that the check if the update was fully successful is sufficient. If this part of the code is reached, there were no error and the branch was validated successfully

n-dusan · 2023-12-07T18:20:54Z

taf/updater/updater_pipeline.py

+                    for (
+                        branch,
+                        validated_commits,
+                    ) in self.state.validated_commits_per_target_repos_branches[
+                        repository.name
+                    ].items():


If I understand correctly validation is done BFS, but merging is DFS. Is it possible that during DFS merge a target repository merge fails (because of some conflict)? If so, will the next time we run updater, the target repository state affect target validation? Since one target repo would have multiple merged commits but other targets wouldn't.

I did not modify the merge logic. If, for any reason, the state of one or more target repositories is not synced with the authentication repository's state, the validation should start from scratch:

taf/taf/updater/updater_pipeline.py

Line 366 in cd24900

def determine_start_commits(self):

n-dusan · 2023-12-07T18:30:36Z

taf/updater/updater_pipeline.py

+    @log_on_start(
+        INFO, "Validating out of band commit and update type", logger=taf_logger
+    )
+    def _validate_out_of_band_and_update_type(self):


I ran taf repo update --url https://github.com/openlawlibrary/open-law to test out clone for cloning dependencies as well. It mostly worked, but got the following errors at the end. Here's the interesting output:

Validating out of band commit and update type Update of test/test failed. One or more referenced authentication repositories could not be validated: 'NoneType' object is not subscriptable 'NoneType' object is not subscriptable 'NoneType' object is not subscriptable 'NoneType' object is not subscriptable 'NoneType' object is not subscriptable 'NoneType' object is not subscriptable 'NoneType' object is not subscriptable Called LifecycleStage.REPO handler. Event: failed Called LifecycleStage.UPDATE handler. Event: failed Update of test/test failed due to error: Update of test/test failed. One or more referenced authentication repositories could not be validated: 'NoneType' object is not subscriptable 'NoneType' object is not subscriptable 'NoneType' object is not subscriptable 'NoneType' object is not subscriptable 'NoneType' object is not subscriptable 'NoneType' object is not subscriptable 'NoneType' object is not subscriptable

… validating

renatav added 23 commits October 13, 2023 01:55

Merge branch 'master' into renatav/better-errors-and-logging

3e32ed6

fix: if last successful commit is invalid, report the issue and raise…

5db90c1

… appropriate error

feat: initial work on bf update. Listing targets data per auth commits

6b6894f

refact: move target repos initial state validation and commits fetchi…

24924e1

…ng to pipeline

feat: initial breadth-first targets validation

c08ea48

refact: additional work on moving code to the updater pipeline

cc4c6a5

fix: do not expect target commit to be changed in every auth commit. …

ed887d3

…Improved logging

refact: integrate updater pipeline into updater, finish new implement…

beb5a0a

…atino of setting targets_data

Merge branch 'master' into renatav/bf-update

3f87fe9

fix: updater pipeline - iterate through target commits properly

c3fa1eb

feat: updater - work on adding support for partial update

320728e

feat: updater - raise an error in case of additional unauthenticated …

3930ef2

…commits and pipeline fixes

feat, refact: updater pipeline - start update from the beginning if i…

5607011

…nitial state is out-of-sync

test: updater - define patterns for checking error messages, make mor…

e8312b0

…e generic

fix: updater pipeline - raise error if update type is not correct

f1f324c

chore: formatting

a9ed435

chore: fix flake and mypy errors

ba364e7

fix: fix updater when target files do not exist but repositroies.json…

1cbf90b

… does

fix: fix failing test, revert determine branch updates

eaefc70

feat: updater - set last validated commit after partial update

9c6f978

chore: remove unused import and invalid test

944859d

renatav changed the title ~~Renatav/bf update~~ Updater pipeline and initial work on the breadth-first update Dec 7, 2023

chore: update changelog

588ec75

renatav self-assigned this Dec 7, 2023

renatav marked this pull request as ready for review December 7, 2023 03:02

renatav requested a review from n-dusan December 7, 2023 03:02

n-dusan requested changes Dec 7, 2023

View reviewed changes

fix: out of band validation minor fix, minor error handling fixes

bfee646

fix: fix target commits fetching

8918757

renatav force-pushed the renatav/bf-update branch from 1d244b9 to 8918757 Compare December 12, 2023 17:38

renatav added 4 commits December 12, 2023 19:47

test: fix failing create repository test

cd24900

feat: define which updater steps should be run when updating and only…

7c4c39b

… validating

chore: minor changelog fix and remove a TODO

ef0966a

chore: remvoe outdated todo

5974c0c

renatav force-pushed the renatav/bf-update branch from f9ca062 to 5974c0c Compare December 13, 2023 19:27

renatav requested a review from n-dusan December 13, 2023 19:30

n-dusan approved these changes Dec 13, 2023

View reviewed changes

renatav merged commit b3b6b4d into master Dec 13, 2023
25 checks passed

renatav deleted the renatav/bf-update branch December 13, 2023 19:56

This was referenced Jan 11, 2024

Commits mismatch incorrect error logged #238

Closed

Breadth first update #281

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updater pipeline and initial work on the breadth-first update #374

Updater pipeline and initial work on the breadth-first update #374

renatav commented Dec 6, 2023 •

edited

Loading

n-dusan left a comment

n-dusan Dec 7, 2023

n-dusan Dec 7, 2023

renatav Dec 11, 2023

n-dusan Dec 7, 2023

renatav Dec 11, 2023

n-dusan Dec 7, 2023

renatav Dec 11, 2023

n-dusan Dec 12, 2023

renatav Dec 12, 2023

n-dusan Dec 7, 2023

renatav Dec 13, 2023

n-dusan Dec 7, 2023

renatav Dec 13, 2023

n-dusan Dec 7, 2023

renatav Dec 11, 2023

	[0.11.1]: https://github.com/openlawlibrary/taf/compare/v0.11.1...v0.11.2
	[0.11.1]: https://github.com/openlawlibrary/taf/compare/v0.11.0...v0.11.1

	@log_on_start(INFO, "Finished cloning target repositories", logger=taf_logger)
	@log_on_end(INFO, "Finished cloning target repositories", logger=taf_logger)

Updater pipeline and initial work on the breadth-first update #374

Updater pipeline and initial work on the breadth-first update #374

Conversation

renatav commented Dec 6, 2023 • edited Loading

Description (e.g. "Related to ...", etc.)

Code review checklist (for code reviewer to complete)

n-dusan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

renatav commented Dec 6, 2023 •

edited

Loading