-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RocM container fails on certain AMD systems. #497
Comments
This is gonna be a constant issue... A --cpu flag seems fine to me... But it won't fix the issue of the GPU not working of course, we are not gonna work on every single GPU in the world, but if someone opens an PR to support this one, great! I did deliberately remove support for a lot of older GPUs in the AMD Containerfile to save about 20G in container image size, but if people enable extra ones one by one they would like, no big deal. The problem is if you enable every little one, you get a huge image. Also some GPUs will just prove to be headaches and a lot of effort. |
The two ways I could see this fixed was a If we go with Writing the above kind of made me think that adding --cpu was going to make it more complicated than having --gpu checked with the container. |
@ericcurtin @smooge what should we do with this one? |
I think a --cpu flag is fine. Once we get one more change into llama.cpp , I would like to rebuild all the containers and test AMD support again |
In trying to debug https://bugzilla.redhat.com/show_bug.cgi?id=2329826 I found that the containers for Rocm would not work with at least 2 AMD chipsets:
This fails on
I could not figure out a way to force it to use just the CPU so possibly a
--cpu
flag which tells it not to try speeding things up ?The text was updated successfully, but these errors were encountered: