-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA out of memory. #3
Comments
No there is no problem in the data. The app is thought to run on entires tractograms The problem is that the current implementation requires ~10GB of GPU memory, but from your logs it seems there were only 8.68GB allocated for Pytorch. I just tried to rerun the app on my private test repository, and I got out of memory as well, but now with this message: I noticed that in the #PBS lines I merged from your pull request there is one saying vram 8GB. Does this refer to the main RAM or to the GPU ram? |
#PBS vmem only applies to the main memory. It sounds like there is some invisible memory stuck on the GPU? I saw the same error message when I ran it through PSC Bridges, so I am a bit skeptical of the invisible memory theory though. |
@pietroastolfi I couldn't find any process that might be holding up the extra GPU memory. It looks like it's your code itself that is allocating the extra memory. The question is, why did it work a few days ago, and why it doesn't work now? I see that the container you are using Also, I'd like to propose doing the following..
|
I've tested this App with a small test .trk file (generated by TractSeg as tck then converted to trk). It has ~22k fibers
When I run the App on gpu1, it fails with the following error message.
I believe the "6.37 GiB already allocated" is from this App itself, as I don't see any other process running on gpu1 at the moment. Is there something wrong with my data? Maybe it doesn't work with full brain tractography?
Here is the provenance graph for my input trk data.
The text was updated successfully, but these errors were encountered: