authservice node peek cancelation issues #256

halkyon · 2022-10-06T19:10:37Z

An authservice node will issue a peek request to other nodes if it can't find a record. This is to ensure we cover the case of a user inserting an auth record in one region and retrieving in another region (e.g. insert on ap1, retrieve on us1 as the service is global), or the request goes to node (a) behind a load balancer/proxy but then the retrieval request goes to node (b).

We do a "broadcasted get" sending a peek request to all nodes and the first one to return a record wins, with the rest being canceled.

However, we noticed a few things:

Logs show the peek context cancelation error for requests we canceled intentionally.
as_badgerauth_peer_down metric seems to be reported when peeks occur. Connection between nodes appear to be fine in production now that we solved https://github.com/storj/infra/issues/3086. Artur thinks it's probably Context cancellation closes the connection drpc#37 in which the way we cancel peek requests is causing redials to other nodes in the DRPC connection pool.

Acceptance criteria:

Log does not show intentional peek cancellation errors.
as_badgerauth_peer_down metric is not be reported when we cancel peek requests.

The text was updated successfully, but these errors were encountered:

zeebo · 2022-10-06T20:02:00Z

Figuring out a better strategy in DRPC for connection reuse in the presence of cancellation seems to be more and more important, and this seems like a great example problem to work on because we control all the components for debugging and inspection and it happens frequently. I think DRPC is currently a little bit too eager to close connections and could maybe delay the decision to close until right before it's about to used.

Happy to help out with this problem if it can be solved by improving DRPC cancellation 😄

halkyon added bug Something isn't working authservice Needs Estimation labels Oct 6, 2022

halkyon added this to the Auth Database Enhancements milestone Oct 6, 2022

halkyon added this to Edge Team Oct 6, 2022

halkyon moved this to Backlog in Edge Team Oct 6, 2022

halkyon removed this from the Auth Database Enhancements milestone Oct 6, 2022

amwolff added this to the Auth Database Enhancements milestone Oct 7, 2022

NiaStorj added the edge label Dec 12, 2022

amwolff closed this as not planned Won't fix, can't repro, duplicate, stale Dec 13, 2023

github-project-automation bot moved this from Backlog to Done in Edge Team Dec 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

authservice node peek cancelation issues #256

authservice node peek cancelation issues #256

halkyon commented Oct 6, 2022 •

edited

Loading

zeebo commented Oct 6, 2022 •

edited

Loading

authservice node peek cancelation issues #256

authservice node peek cancelation issues #256

Comments

halkyon commented Oct 6, 2022 • edited Loading

zeebo commented Oct 6, 2022 • edited Loading

halkyon commented Oct 6, 2022 •

edited

Loading

zeebo commented Oct 6, 2022 •

edited

Loading