MAYBE fix flaky TestPolicySetsCreate/with_vcs_policy_updated
#811
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This flake has been extremely rowdy lately. The flake signature is:
It turns out PoliciesPath is kind of a red herring — the backend erases that
value if the VCSRepo is removed, as documented in the API reference. The real
action is in VCSRepo, which comes back as a nil pointer after that update. And,
if you modify the test to do an additional Read on the policy set to compare, it
agrees with what the Update returned — the repo got nilled out for some reason.
I went looking in the backend itself, and it turns out that OAuthClients have
some async cleanup behavior on deletion. And the prior subtest that sets up the
policy set used by the flaky test cleans up its OAuthClient when it's done,
leaving the next subtest to create a new one. My working theory is that there's
a race in the backend: if the Update call with the new VCSRepo values manage to
slip in before the async cleanup for the old OAuthClient is done, it'll get
nilled out instead of creating a new VCSRepo model.
IF this guess is right, then we can avoid the flake by not trying to eagerly
clean up the first OAuthClient before the rest of the subtests run. We're
cleaning up the whole org at the end of the outer test block anyway, so there's
no real benefit to being in a hurry, and an org can have multiple OAuthClient
instances at once.
Testing plan
External links
Jira is TF-9569.
Output from tests
see CI