-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EPICS base 7.0.3.1, deadlock #163
Comments
The main thread is waiting for the first of four caProvider worker threads to exit. Can you say if this thread (in pvAccessCPP/src/ca/caProvider.cpp Lines 80 to 83 in 0897836
Also, you're emails mentioned that you had made a workaround. The details of what you had to change to do this might also give some hints. |
Also, since trace for the main thread has been cropped. Is the |
|
Thanks for the details. Seems likely to be straightforward issue with CAProvider. Stopping the singleton worker hangs because the worker has already been stopped. @mrkraimer @anjohnson Over to you guys. |
@mdavidsaver Is it sensible to instantiate multiple Channel Providers of the same type? I thought these were intended to be one-off objects, and it looks like the CA provider was coded with that assumption but without trying to enforce it. If an application creates multiple CA providers it will use multiple CA client contexts and multiple TCP sockets when talking to the same IOC, which could be a major resource drain on the IOCs it connects to. Is that true for the pva provider too? Currently the {{get,put}Done,monitorEvent,channnelConnect}Thread classes all have a singleton that owns the underlying thread, but each CAChannelProvider object has its own |
In general, yes. I can't say whether it is called for in this particular application. eg. Gateways create a different PVA client for each network interface.
We're several steps down a road of fixes which introduce further bugs. imo. it is well past time for redesign. Why are there four workers doing (almost) the same task? Why 4x the code? Do these worker(s) need to be singletons? Also, this issue highlights a gap in testing. Clearly only one instance is being created. At minimum |
Calling the destructor on more than one instance of a pvac::ClientProvider which has been constructed with a provider name of "ca" (channel access) causes (what appears to be) a deadlock. Using "pva" (pvAccess) does not appear to trigger this deadlock. I have attached a screenshot of where the deadlock appears to be located (from pausing the application when running it under a debugger). The code that causes the triggers the deadlock can be found here: https://github.com/ess-dmsc/forward-epics-to-kafka/blob/a75fab2a7343906c147722825a258332fc2126e7/src/EpicsClient/EpicsClientMonitorImpl.h
In this piece of code, the version of EPICS base used is 7.0.3.1. I believe I had a the same issue when testing with earlier versions of EPICS7 a few months back as well.
I have also attached screenshots of stack traces of other threads executing EPICS code.
The text was updated successfully, but these errors were encountered: