-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread leak in netavark-dhcp-proxy #811
Comments
How many macvlan containers are we talking about? Do you know how long your DHCP lease time is? |
16 container ATM, 10 minutes. |
Ok I think that explains why it leaks so fast then. I think we spawn a new thread for each lease but somehow the code does not cleanup the old one so we leak the old thread. |
Any news? |
No, I haven't found the time to reproduce this issue. |
I can take a look at this issue. Can someone point me in the right direction to reproduce this? |
Use macvlan and a DHCP server with as short a lease as reasonable, e.g. a minute. Observe the number of threads? |
yes checking |
I am now able to replicate. I started 10 containers on a network where the lease is only 60 seconds. In my case, the nv dhcp-proxy PID is
|
Ah, just noticed this issue. Could this be related? My DHCP lease time is 30 mins. Thanks! |
I definitely have this thread leak, there were 13708 threads for ~15 containers after 3 days of running - and I was also seeing #618 as a symptom (I assume, of thread starvation). I have the underlying pattern (IPv6 multicast on IPv4 network) I updated past the fix for that specific symptom and I'm watching how many threads it creates long-term |
My thread leak seems "better, but not totally fixed". I have 1497 threads after 6 days (post #1022) versus the 13708 after 3 days. Importantly the dhcp-proxy is not spinning CPU right now and my core symptom (restarting containers sometimes had dhcp task aborts) is gone |
Using SuSE MicroOS with a bunch of macvlan-using containers, I see netvark-dhcp-proxy hanging every few days. From journalctl:
Even with
RUST_BACKTRACE=1
set, it doesn't give a backtrace. Last time this happened, ps reported over 4000 threads for the PID.The text was updated successfully, but these errors were encountered: