Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple resource leaks fixed in proxy and dealer #2098

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

DZabavchik
Copy link
Contributor

@DZabavchik DZabavchik commented Sep 6, 2023

The following resource leaks were identified and fixed:

  1. Proxy doesn't handle client (frontend) disconnects while establishing a backend session (UDS transport) -> UDS handle is leaked, Router retains session indefinitely
  2. Proxy doesn't handle client disconnects in the middle of authentication (HELLO/CHALLENGE, AUTHENTICATE/WELCOME) -> does not close backend session -> UDS handle is leaked, Router retains session from proxy (since there is no ping on UDS, it will never discover client is gone)
  3. Invocations in router were leaked if caller or callee disconnected while invocation(s) were in flight -> referenced caller/callee session objects leaked. Call timeout timers are not cancelled. Slow memory leak in dealer.py #2096
  4. Small (startup/static) leak of authenticator sessions in proxy. If there is a flood of connections on startup, each one see absence of authenticator session in _service_sessions[realm][authrole] and try to create one. Last one to be created will be stored overwriting any previous ones. RouterSessions on the router side will never be closed.

Leaks were discovered during stress/load testing with continuous stream of simulated faulty clients (abruptly closing connections at different stages, with invocations in flight, load-balancer failures resulting in mass disconnects).
No leaks were detected in router or proxy processes with fixes.

image

@mrrishimeena
Copy link

When can we expect this merge?

@oberstet
Copy link
Contributor

When can we expect this merge?

not before all of the CI runs green

@DZabavchik
Copy link
Contributor Author

Added few commits to fix most issues with tests. (unit tests and functional tests).
Sphinx fails. Runners are stuck in queue.

@oberstet
Copy link
Contributor

oberstet commented May 8, 2024

Thanks for working on this, and for fixing the tests specifically! rgd the test runners here on GH: yeah, the self-hosted runner machine is used differently now, and unfortunately I hadn't time yet to clean it up and move it to GH hosted runners again yet ..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants