-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault in test suite #339
Comments
Can you give us some system and OS info to see if we can get someone to reproduce it? |
I'm not quite sure what you would like, but here:
CPU:
|
I was able to get a testing seg fault on an Ubuntu 22.04 LTS machine, with kernel version 5.15.0-58-generic. My conda env had a few minor differences from yours, I assume owing to the different kernel version. I am on to comparing my Mac env to mine and your Linux envs. It's also weird that this doesn't happen on the CI tests, which run on Ubuntu and Mac OS. I'll look for differences between that env and ours as well. |
I'm probably saying something obvious, but Enterprise proper, which is all Python, can hardly ever segfault. So this would be one of the C-level Python libraries. It seems the problem does not involve libstempo either (the tempo2 memory management can be scary). But it seems the problem is rather generalized. Numpy? |
I think it is a As Anne saw, I get somewhat random seg fault locations. The details of the error aren't always clear either. My most recent one did have some details: tests/test_pta.py::TestPTASignals::test_parameterized_orf Fatal Python error: Fatal Python error: Segmentation fault
Thread 0x00007f5991fd0740 (most recent call first):
File "/home/ptb/enterprise/enterprise/pulsar.py", line 661 in PulsarSegmentation fault That line in t2pulsar = t2.tempopulsar(relparfile, reltimfile, maxobs=maxobs, ephem=ephem, clk=clk) |
I found some old discussion in the |
If Any of the other compiled libraries could be the culprit, but it seems to me that If the problem occurs only on recent versions of Ubuntu, it's possible some of the address space randomization changes are triggering a previously latent bug? |
The CI tests run on Ubuntu 22.04.1, and those have been passing. The big difference there is the install chain, which gets |
I think the issue is the
I will open an issue there |
I just followed the procedure you describe above - get suitesparse (and python) from conda, compile tempo2 by hand, and everything else from pip - and I still get the intermittent segfaults. |
I also used a modified version of https://github.com/ipta/pulsar-env/blob/main/anaconda-env.yml to build a conda environment where the test suite has access to tempo2, and, with all packages straight out of conda, still got segfaults. |
Can you see if merely loading the par/tim from the tempo2 executable will segfault? The next thing to check would be to see which call in the tempopulsar constructor triggers the segfault. I need to get access to ubuntu to try this... |
The version of Unfortunately, the segfault is not at all immediate; in fact you can run any particular file of tests without (so far) triggering a segfault; it's only when you run the whole test suite that the segfault (usually) occurs (in different places). So I don't think we can expect simply running tempo2 to trigger the problem? Nevertheless I tried running tempo2 on |
Can you suggest how best to figure out which call triggers the segfault? It occurs at different places on each run, and I am not sure how to inspect a core dump from a |
#340 allows one to circumvent this - if you unset TEMPO2, the test suite and Enterprise generally now fall back to PINT and almost everything works (T2 timing model and inflate/deflate are the current exceptions). |
I have a fresh install of the development version, and when I run the test suite I get a segmentation fault. The fault occurs at a different place in the test suite each time, once reaching 92% before dying, so it's possible the test suite might actually pass eventually without the problem being fixed.
I installed the development version following these instructions:
Here is a partial list of the tests where the segfault has occurred:
The text was updated successfully, but these errors were encountered: