-
-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test AMD Radeon Pro W7700 & RX 7700 XT GPUs #680
Comments
lspci output:
And dmesg pcie output:
|
Using the same kernel I was testing with the RX 6700 XT (using @Coreforge's patch, but from a week or so ago), I am getting:
|
Same thing as the 7700xt then. It'll also need the fixes transferred to gfx11 and the other blocks this card has that are a different version from the 6000 series. That should be fairly simple to do though. |
All right, I've manually downloaded the firmware files (same as @martinx72 in this comment):
After a reboot:
Let the memset detections commence! |
@Coreforge - what's the simplest way to get a debug loop going for these faults? I don't see the file that's hit in the kernel panic, it would help a lot to get a debugger going or something, but last time I ran into issues trying to get it set up. |
The easiest way I've found is to look up the relevant functions using something like elixir or cscope. Usually you'll want to look up the function in the link register. From there, it's either fairly obvious, or you'll have to look at other code to figure out which calls are causing issues (or I guess placing Since the code for all of these cards is quite similar, it's probably also enough to just transfer the changes from gfx10 to gfx11. I can do that tomorrow if I remember. The general process should also work the same for other cards/drivers, but only if they cause faults (which it didn't look like nouveau does). I haven't gotten |
To debug the kernel, I'll usually use |
@DanaGoyette - Oh cool! TIL, going to give that a go. But probably tomorrow now since it's the end of the day and I'm just seeing your message lol. @Coreforge thanks for the notes. If you get to it tomorrow great, otherwise there's a chance I can get to it later or next week (I have a video going up in the morning tomorrow—unrelated to this, and some other errands to run!). |
I've put the patch up in a gist. |
@Coreforge - Patch applied cleanly, recompiled and tested. Got some different faults (a bunch); one below then the rest in this gist: https://gist.github.com/geerlingguy/05c34678d2802af271635da3b794a8b3
|
Looks like you're still missing the sdma firmware. |
with my RX7700,
download the missing firmware via
here is what it hits with that latest patch
What i noticed are
and
And the full dmesg log is attached here: |
Looks like the MES wasn't being used on the 6700xt and 6600xt. I updated the gist, so it should get further now. Those messages about not being able to get a large BAR are normal, the pi5 doesn't provide enough PCIe address space for one port to get a large BAR on 12GB cards (8GB cards seem to be able to get one). I don't know yet if that is due to hardware limitations or if that's something that could be changed, I haven't been successful at getting it to work yet at least. |
Okay, grabbed the latest version of your gist, and installed the additional firmware bit:
After a recompile and reboot... it gets further and just one more memset it's triggering here:
|
I made one modification and recompiled... diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c
index fe1995ed13be..ddf7c4b2b9e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c
@@ -215,7 +215,7 @@ static int psp_v13_0_bootloader_load_component(struct psp_context *psp,
if (ret)
return ret;
- memset(psp->fw_pri_buf, 0, PSP_1_MEG);
+ memset_io(psp->fw_pri_buf, 0, PSP_1_MEG);
/* Copy PSP KDB binary to memory */
memcpy(psp->fw_pri_buf, bin_desc->start_addr, bin_desc->size_bytes); However, when I rebooted, I think the Pi froze (it does this sometimes, it just hangs when I issue the reboot command), and I'm remote, so I won't be able to debug further until Monday. Maybe that fixed it, maybe not, ha! Edit: caught another memset:
Will debug later! |
There's a memcpy right after that memset that will also cause issues, and there's another set of those in the same file again. I updated the gist to also fix those. I've also had it hang on reboot when it couldn't initialize the GPU properly, though that hasn't been a big deal for me. |
@Coreforge - Thanks; just applied and rebooted, same hang going on, but I forgot to mention, at some point (who knows when, I closed out of my VPN session last night after 20 minutes or so), it did finally reboot. So I'll do the same and check in again in a few hours. 🤞 |
Woohoo!
No clue what's on the display though, I didn't set up a remote camera to check lol. I'll see how it's working in the morning. @Coreforge thanks for all your help! |
Looks good so far. |
I wonder if @6by9 or @pelwell might have some advice about increasing the BAR space on Pi 5? I haven't attempted tweaking it at all since the days of the CM4 (see my guide for CM4 BAR space). Right now these higher-VRAM cards are all setting resizable bar to 256 MB (example below is 6700 XT):
|
Indeed, it's working!
OBS 30 (installed via Pi-Apps) didn't seem to know what to do with the GPU, there were no warnings in I installed Blender using |
I'm not sure if the vaapi drivers are installed by default, so you might have to install those (although I expect the same issues I have on the 6700xt to be present on these cards too, I need to continue debugging those). The 6700xt was a lot more CPU heavy than the 460 in some loads for me, which I think might be due to not having a large BAR. |
Blender currently isn't in the debian testing repository, so I'd need to build it from source to test it, which might be a bit annoying with some libraries currently apparently not being available. |
@0cc4m - It looks like those Vulkan instructions are built off Windows x86 — do you have specific instructions for Linux? It looks like I could download the Vulkan SDK separately for Linux, but how can I build with the SDK dependencies on Linux (w64devkit.exe doesn't run there). Oh... read through the rest, and it looks like:
However, I get the error:
So trying some more...
Also looking at the Vulkan SDK download site linked from those docs, the 1.3.296.0 (latest) version only has files under |
Trying again with the Docker method...
However, this also fails:
I'm guessing that package is only built for x86 (see: https://packages.lunarg.com/vulkan/1.3.296/dists/jammy/main/). |
@geerlingguy You don't need the Vulkan SDK, the |
@0cc4m Ah, indeed! Just did a
Then the Release build:
|
Great! The initial run will probably be stuck for a while on the shader compile step due to the low CPU power, but (assuming it works just as it does on AMD64) the shaders should be cached afterwards. You can download a GGUF model from Huggingface to try it, for example Llama 8B Instruct Q4_K_S. |
Testing it on llama3.1:8b as you linked above:
|
Trying with a smaller model (
|
Looks like the memory allocation issue may be a problem on some other cards / models too: ggerganov/llama.cpp#5441 But would be nice to run a larger model on this card with it's 16 GB of RAM... |
@geerlingguy would be great to have radeontop output while running even small model. Eventually tweaking shared memory size between host and GPU and vulkaninfo to see exactly what's going on |
Also testing
Note: All the above tests are on my RX 6700 XT, I forgot I have that plugged in right now lol. |
Can you upload the output of vulkaninfo on the Pi? |
@0cc4m Here it is: |
Looks good. There have been cases of allocations failing even though according to the driver they should be fine. The vulkaninfo output shows a max allocation size of 4294967292, but apparently that doesn't work. I built a workaround for that, can you try setting the environment variable |
@0cc4m I tried:
Then at 1 GB:
Yay, 1 GB seems to have worked! |
What's the GPU memory usage when running such experiments? Does it allocate 1GB on GPU, or more, or less? |
@KhazAkar - 4 GB while loading, 5 GB while running: (Again, note this is the RX 6700 XT) |
That's a bug, very odd that I didn't see that before..
It doesn't mean that it only allocates 1GB, it means it allocates 1GB chunks. The maximum size buffer I can allocate on most devices with Vulkan is 4GB, but somehow that's not working here. This is an ARM64 OS, right? |
@0cc4m - Yes, Pi OS Bookworm (Based on Debian 12), arm64:
|
Alright, then it's some driver thing again. 2GB should work if we just reduce it by one, try |
@0cc4m - Thanks for the help here. I'm doing a little benchmarking over here: geerlingguy/ollama-benchmark#1 Now I'm tempted to buy a couple more AMD GPUs to see how things go with lower priced cards... Edit: Also, 2GB does indeed work reducing it by one bit :) |
(I moved a conversation about kernel debugging over to #684) |
Tested with RPI5 8GB. OEM Radeon RX 6800 XT. On the stock 64 bit image of rpiOS based on debian 12. I installed KDE and messed up the auto login for LightDM. This had a side effect of kicking me back into the default desktop environment and turning on GPU acceleration and it's smooth as butter! Running this with 3 4k monitors. I will boot into KDE and test games now. |
I thought you'd enjoy the funny image of a mac pro with a pi bolted into it running 3 4k monitors. I had done some more testing. Raspberry pi os keeps trying to open up random things like the accessibility menu while I'm typing. Sometimes the keyboard lags and I have no idea why. I couldn't get sddm or kde to launch. It was saying that the $DISPLAY was not defined or it could not connect to the X server. I couldn't get box64 or box86 (whichever is the pi-apps steam installer) to launch steam. For some reason firefox had like, "a wind up" to get gpu acceleration going. I think this is due to the PCIE 3x 1x bridge lol. But other than that, I used it, it was nice. I couldn't really do any development because gfortran is stuck on version 12 on debian 12. Was a fun project though! |
@Coreforge have you rebased your patch on the 6.12 LTS kernel recently? I've been checking updates and it seems like there was also a set of patches to improve video encode/decode on older AMD cards, would be interesting to test with that kernel. Not sure if/when Raspberry pi would move to 6.12 in Pi OS though. |
@geerlingguy i have a 6.12 branch w/ @Coreforge 's "core" fixes + some formatting changes per the github checks, https://github.com/nicholasaiello/linux/tree/rpi-6.12.y-coreforge-amdgpu while i've used this kernel, i haven't been able to directly test a GPU w/ it b/c i'm trying to connect to a MINISFORUM DEG1 using OcuLink without any success, not in 6.6.y or 6.12.y, or w/ different nvme adapters. that's a separate issue... TLDR; try w/ that branch if you'd like. FWIW: i've used the 6.6.y patches w/ my RX 7900 XT via a uPCIty hat without issue. i'll likely kick the can on the oculink dream and go back to the uPCIty hat for testing patches w/ 6.12.y and 6.13.y kernels. |
@nicholasaiello ah I do remember you mentioning that. I will check it out later, just rebased on the same 6.6.y branch yesterday and only had to make one change for a conflicting commit on some memsets. |
I have an AMD Radeon Pro W7700 that I'm planning on testing in an Ampere workstation.
This card runs great on the Pi 5, using @Coreforge's branch we started working on in #222
Two massive features of this card:
It remains to be seen how many bits of the driver we can get running on this card.
Current steps to get this card working with Pi OS Bookworm
Last updated: 2024-11-11
6.6.y
kernel tree with Coreforge's GPU-enablement patch (or just check out Coreforge's branch directly).make menuconfig
and select the options:1. Kernel Features > Page Size > 4 KB (for Box86 compatibility)
2. Kernel Features > Kernel support for 32-bit EL0 > Fix up misaligned multi-word loads and stores in user space
3. Kernel Features > Fix up misaligned loads and stores from userspace for 64bit code
4. Device Drivers > Graphics support > AMD GPU (optionally SI/CIK support too)
5. Device Drivers > Graphics support > Direct Rendering Manager (XFree86 4.1.0 and higher DRI support) > Force Architecture can write-combine memory
AMD GPU Firmware for Bookworm
Because Pi OS 12 is based on Debian 12 Bookworm, and it's
firmware-amd-graphics
package doesn't include the firmware for the latest-generation AMD cards, you will have to install that package and download supplemental firmware files from thelinux-firmware
repo:Confirm everything is working by plugging a monitor into the graphics card; then confirm the card's GPU is in use by running
glxinfo -B
(part of themesa-utils
package), for example:(Prepend
DISPLAY=:0
if running commands over SSH.)Hardware video transcoding support
If you would like to enable hardware transcoding, you need to install the Mesa VAAPI drivers:
Then you should be able to see the VAAPI info, and apps like OBS (
sudo apt install obs-studio
) should be able to use the hardware transcoding instead ofx264
on the CPU. Confirm it's working withvainfo
:The text was updated successfully, but these errors were encountered: