-
-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU Stats #262
Comments
We may be able to pull data for Nvidia and AMD GPUs from It would probably only work with the binary version of the agent, though. |
Experimental support for Nvidia and AMD will be added in 0.7.4. This works for the binary agent only and requires To enable, set the environment variable If you used the install script, you can do this by adding sudo systemctl daemon-reload
sudo systemctl restart beszel-agent Any feedback is appreciated. If it works then I'll enable it by default in the next minor release. I don't have a device using an Intel GPU, unfortunately, so I won't be able to add that. Tip Installing sudo ln -s /opt/rocm/bin/rocm-smi /usr/local/bin/rocm-smi |
Argh sad you can't get intel stats being most are no doubt using iGPUs 🙁 Amazing work for the Nvidia and AMD guys though. |
If it's useful here is a JSON output from intel_gpu_top
https://manpages.debian.org/testing/intel-gpu-tools/intel_gpu_top.1.en.html |
Maybe |
How do I do this if I'm running beszel-agent as a Docker service through docker-compose.yaml? WARN GPU err="no GPU found - install nvidia-smi or rocm-smi" # cat docker-compose.yaml
services:
beszel-agent:
image: "henrygd/beszel-agent"
container_name: "beszel-agent"
restart: unless-stopped
network_mode: host
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
environment:
PORT: 45876
KEY: "ssh-ed25519 AAAAC3NzaC1lZ......adrfOpvRdFLD6p"
GPU: "true" # nvidia-smi
Sun Nov 10 13:59:38 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06 Driver Version: 525.125.06 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:09:00.0 Off | N/A |
| 35% 38C P8 14W / 215W | 1121MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:0A:00.0 Off | N/A |
| 33% 35C P8 2W / 215W | 3MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1472954 C python 1118MiB |
+-----------------------------------------------------------------------------+ |
You can't at the moment, see above May it will be possible in future. The image needs to include nvidia-smi or rocm-smi, then it should be possible to mount the gpu like this: services:
beszel-agent:
image: "henrygd/beszel-agent"
container_name: "beszel-agent"
restart: unless-stopped
network_mode: host
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities:
- gpu
environment:
PORT: 45876
KEY: "ssh-ed25519 AAAAC3NzaC1lZ......adrfOpvRdFLD6p"
GPU: "true" NVIDIA container toolkit should be installed. |
I don't plan to include But I can add example dockerfiles and compose configs. Maybe even build them automatically as separate images or tags. |
As separate images would be the best way 👍🏻 |
@Morethanevil This didn't work for me. I have container toolkit installed, we are using GPU in our other applications deployed by docker-compose and the
So it can't be done if it's not build as separated images? |
As I said before: it is not possible at the moment with docker, since there is no nvidia-smi included. You need to wait for a separate image. I just provided an example how to include the GPU into the compose.yaml (docker-compose.yml) |
I see, sorry for the misunderstanding. Haven't got a coffee yet. (cheap excuse 😅) |
Interesting. Didn't know this existed. Most guides point you towards jntel_gpu_top. I'm happy to test but appreciate its tricky to code it you don't have the hardware. |
I think I was wrong about Does anyone know if it works with newer iGPUs and Arc cards? If someone with Intel can look into this further and compare We need JSON or CSV output and ideally all the same info as Nvidia / AMD -- GPU name, utilization, VRAM usage, power draw, and temperature. Maybe next week I'll have some time to try doing it blind with sample output. |
I'm running a Intel Arc A750 and I can read out the following informations as a JSON:
|
#262 (comment) - my output here from an Intel ARC A310 |
I have been trying to run
which is totally expected. |
@Jamy-L Thanks, let's strike |
@henrygd I stumpled yesterday on your project and need that feature for multiple machines. So here I made a fast and quick working image for nvidia gpu's:
Here a quick example from my older server: The Image as a size of 289MB, so yeah not the lightweight of 8MB anymore :') If that aproach is okay for you I could look up ways for amd and intel gpu's |
Thank you for providing the Dockerfile for that image @Hilbsam! It's now working on my TrueNAS Scale install :) |
Thanks @Hilbsam, I will try this out and add it to the docs when I have time. AMD has documentation for Running ROCm Docker containers, but all the official images appear to be gigantic. So I might try starting with a smaller base image and manually installing dependencies. If you have any ideas for Intel that would be great, as I'm not sure what to do with that. |
This is great! Thanks for sharing @Hilbsam! I was able to use this configuration too. If it helps with documentation here's my final compose setup:
With the Dockerfile from @Hilbsam above placed in the |
I'm having problem with this solution.
Seems like issue #369 |
@SonGokussj4 Try putting quotes around the key value like this: environment:
KEY: "..." If that doesn't work I will help you in #369. Just paste the compose content over there, thanks. |
Oh my. I'm so ashamed... I was copying the docker-compose file at work when I didn't pay much attention and forgot this section. volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
# monitor other disks / partitions by mounting a folder in /extra-filesystems
# - /mnt/disk/.beszel:/extra-filesystems/sda1:ro
environment:
PORT: 45876
KEY: REDACTED When I added this, with appropriate key (without quotes), it all works and I see GPUs! Nice! Thank you and sorry for the last post. To have a complete post for anyone else, this is my setup. I've updated the FROM henrygd/beszel-agent:latest as beszel
# Stage 2: Final image with NVIDIA Toolkit
FROM nvidia/cuda:12.0.0-base-ubuntu20.04
# Copy the agent binary from the Beszel base image
COPY --from=beszel /agent /agent
# Install NVIDIA Toolkit
RUN apt-get update && \
apt-get install -y --no-install-recommends nvidia-container-toolkit && \
rm -rf /var/lib/apt/lists/*
# Set the entrypoint
ENTRYPOINT ["/agent"] @zachatrocity's $ cat docker-compose.yaml
services:
beszel-agent:
build:
context: .
dockerfile: Dockerfile
container_name: beszel-agent
restart: unless-stopped
network_mode: host
runtime: nvidia
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
environment:
PORT: 45876
KEY: ssh-ed25519 AAAAC3...aJ34gQE+
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all # <---- We have 1 or 2 or 3 GPUs in our servers
capabilities:
- gpu |
But surprisingly with the exact dockerfile and docker-compose.yaml file I can list them on both servers with nvidia-smi
But the edit: I have 2 servers where it works and i see GPUs in beszel and 2 where it isn't. Still didn't find why. |
@SonGokussj4 Here's the command we use. See if it gives you anything strange on those machines. nvidia-smi -l 4 --query-gpu=index,name,temperature.gpu,memory.used,memory.total,utilization.gpu,power.draw --format=csv,noheader,nounits Also check the agent logs if you haven't already to see if it's printing errors. |
Working system$ nvidia-smi -l 4 --query-gpu=index,name,temperature.gpu,memory.used,memory.total,utilization.gpu,power.draw --format=csv,noheader,nounits
0, NVIDIA GeForce RTX 2070 SUPER, 37, 573, 8192, 0, 14.73
1, NVIDIA GeForce RTX 2070 SUPER, 34, 6383, 8192, 0, 3.51
$ docker compose logs -f
beszel-agent | 2025/01/06 22:05:18 INFO Detected root device name=sdb2
beszel-agent | 2025/01/06 22:05:18 INFO Detected network interface name=eno1 sent=91388882432 recv=1469910483390
beszel-agent | 2025/01/06 22:05:18 INFO Starting SSH server address=:45876 Second working system$ nvidia-smi -l 4 --query-gpu=index,name,temperature.gpu,memory.used,memory.total,utilization.gpu,power.draw --format=csv,noheader,nounits
0, NVIDIA GeForce RTX 2070 SUPER, 37, 573, 8192, 0, 14.70
1, NVIDIA GeForce RTX 2070 SUPER, 34, 6383, 8192, 0, 3.55
$ docker compose logs -f
beszel-agent | 2025/01/06 23:19:11 INFO Detected root device name=nvme0n1p1
beszel-agent | 2025/01/06 23:19:11 INFO Detected network interface name=enp5s0 sent=989189675184 recv=20938608783622
beszel-agent | 2025/01/06 23:19:11 INFO Docker 24.0.6 is outdated. Upgrade if possible. See https://github.com/henrygd/beszel/issues/58
beszel-agent | 2025/01/06 23:19:11 INFO Starting SSH server address=:45876
Problematic system$ nvidia-smi -l 4 --query-gpu=index,name,temperature.gpu,memory.used,memory.total,utilization.gpu,power.draw --format=csv,noheader,nounits
0, NVIDIA GeForce GTX 1070 Ti, 34, 2, 8192, 0, 12.27
1, NVIDIA GeForce GTX 1070, 37, 2, 8192, 0, 11.31
$ docker compose logs -f
beszel-agent | 2025/01/06 23:06:13 INFO Detected root device name=sda2
beszel-agent | 2025/01/06 23:06:13 INFO Detected network interface name=eno1 sent=4934451931 recv=34122333223
beszel-agent | 2025/01/06 23:06:13 INFO Docker 24.0.7 is outdated. Upgrade if possible. See https://github.com/henrygd/beszel/issues/58
beszel-agent | 2025/01/06 23:06:13 INFO Starting SSH server address=:45876 |
Thanks for the output, I'll investigate tomorrow. It continues running and printing data every four seconds on the problematic system, right? |
Yes. $ nvidia-smi -l 4 --query-gpu=index,name,temperature.gpu,memory.used,memory.total,utilization.gpu,power.draw --format=csv,noheader,nounits
0, NVIDIA GeForce GTX 1070 Ti, 34, 2, 8192, 0, 12.52
1, NVIDIA GeForce GTX 1070, 37, 2, 8192, 0, 11.21
0, NVIDIA GeForce GTX 1070 Ti, 34, 2, 8192, 0, 12.28
1, NVIDIA GeForce GTX 1070, 37, 2, 8192, 0, 11.41
0, NVIDIA GeForce GTX 1070 Ti, 34, 2, 8192, 0, 12.52
1, NVIDIA GeForce GTX 1070, 37, 2, 8192, 0, 11.31
0, NVIDIA GeForce GTX 1070 Ti, 34, 2, 8192, 0, 12.28
1, NVIDIA GeForce GTX 1070, 37, 2, 8192, 0, 11.31
0, NVIDIA GeForce GTX 1070 Ti, 34, 2, 8192, 0, 12.52
1, NVIDIA GeForce GTX 1070, 37, 2, 8192, 0, 12.20
0, NVIDIA GeForce GTX 1070 Ti, 34, 2, 8192, 0, 12.42
1, NVIDIA GeForce GTX 1070, 37, 2, 8192, 0, 13.98 I'm now trying to update the docker version. But it's a work computer so not sure how it will go :-) Edit: $ docker --version
Docker version 27.4.1, build b9d17ea
$ docker compose logs -f
beszel-agent | 2025/01/06 23:32:55 INFO Detected root device name=nvme0n1p2
beszel-agent | 2025/01/06 23:32:55 INFO Detected network interface name=eno1 sent=992129 recv=74292416
beszel-agent | 2025/01/06 23:32:55 INFO Starting SSH server address=:45876
$ nvidia-smi -l 4 --query-gpu=index,name,temperature.gpu,memory.used,memory.total,utilization.gpu,power.draw --format=csv,noheader,nounits
0, NVIDIA GeForce RTX 2060 SUPER, 32, 1, 8192, 0, 9.40
0, NVIDIA GeForce RTX 2060 SUPER, 32, 1, 8192, 0, 9.50
0, NVIDIA GeForce RTX 2060 SUPER, 32, 1, 8192, 0, 9.50 After docker update, still no GPU in the baszel dashboard. |
@henrygd I took a look at the xpu-smi. Do you want to support consumer, enterprise or both card types? |
@Hilbsam That was an oversight on my part. We won't use Looks like |
If i add
to my docker file, the agent doesnt start. What am i missing?
|
Can open up another issue if necessary, but just installed this on Ubuntu 24.04 host with an AMD GPU and noticed that the power draw seems to be stuck loading. The memory metrics are working. rocm-smi is installed and available and does report AvgPwr. The video card is an AMD 7900xtx |
@VinterSolen Have you tried installing or updating NVIDIA Container Toolkit? Maybe try restarting Docker if you just installed. @jcv- Please open a new issue so it's searchable if someone runs into the same thing. Include output for the command below. Thanks! rocm-smi --showtemp --showuse --showpower --showmeminfo vram --json |
Just tried. After restart of docker, it seems the agent starts, but no gpu related info shows on the webpage. Log of beszel-agent:
|
@VinterSolen I'm just going to refer back to an old comment of mine. Please try to run the offical docker gpu container to check if docker can pass through your gpu. #262 |
This is my output from that:
Which makes me think it should work. |
@VinterSolen Can you double check that you saved the dockerfile and include the build instructions? It should be saved as services:
beszel-agent:
build:
context: .
dockerfile: Dockerfile I just tested myself and using simply |
I have this in the same docker-compose.yml file as the rest of the beszel one.
Is this wrong way? |
First things first, please do not share the KEY with anyone! 🥲 And yes please update your docker-compose.yml. The current offical image doesn't support the gpu as it is mentioned in the docs. I shared a working Dockerfile for an bezel agent. Also I saw that in your file you have build and image in the same container... If you do My recommendation is to update your docker-compose.yml file:
Also check if your Dockerfile is correct:
@henrygd I think I'll do a documentation and little Q&A when I get time on your GPU Monitoring if its alright. |
This made it work, missed the build-thing. thanks. |
Just came across Beszel from the tailscale video. We are currently using |
@Hilbsam Absolutely! That would be a big help. I'll try to get Intel support in the next release or two. @arunoruto Interesting, thanks! I'll keep an eye on it. |
Watching this because I use intel GPU's for hardware acceleration in frigate, so it would be nice to get the usage. Thanks for this! |
Would be nice to be able to see GPU stats using intel_gpu_top or AMD equivalent.
Running the hub in docker but agent on containers I'm yet to install the agent on the Proxmox host.
The text was updated successfully, but these errors were encountered: