GPU Stats #262

eximo84 · 2024-11-05T17:51:16Z

Would be nice to be able to see GPU stats using intel_gpu_top or AMD equivalent.

Running the hub in docker but agent on containers I'm yet to install the agent on the Proxmox host.

henrygd · 2024-11-06T19:12:05Z

We may be able to pull data for Nvidia and AMD GPUs from nvidia-smi and rocm-smi.

It would probably only work with the binary version of the agent, though.

henrygd · 2024-11-09T01:19:06Z

Experimental support for Nvidia and AMD will be added in 0.7.4.

This works for the binary agent only and requires nvidia-smi (Nvidia) or rocm-smi (AMD) to be installed on the system.

To enable, set the environment variable GPU=true.

If you used the install script, you can do this by adding Environment="GPU=true" in the [Service] section of /etc/systemd/system/beszel-agent.service. Then reload / restart the service:

sudo systemctl daemon-reload
sudo systemctl restart beszel-agent

Any feedback is appreciated. If it works then I'll enable it by default in the next minor release.

I don't have a device using an Intel GPU, unfortunately, so I won't be able to add that.

Tip

Installing rocm-smi-lib on Arch and Debian places the rocm-smi binary in /opt/rocm. If this isn't in the PATH of the user running beszel-agent, you can fix by symlinking to /usr/local/bin:

sudo ln -s /opt/rocm/bin/rocm-smi /usr/local/bin/rocm-smi

Morethanevil · 2024-11-09T08:12:25Z

Tested it with my GTX 1660 Super, works fine on Fedora 41 with NVIDIA drivers

Did a short conversation with my LLM :)

eximo84 · 2024-11-09T08:16:19Z

Argh sad you can't get intel stats being most are no doubt using iGPUs 🙁

Amazing work for the Nvidia and AMD guys though.

eximo84 · 2024-11-09T08:30:40Z

If it's useful here is a JSON output from intel_gpu_top


{
        "period": {
                "duration": 16.488048,
                "unit": "ms"
        },
        "frequency": {
                "requested": 2244.049750,
                "actual": 1273.649858,
                "unit": "MHz"
        },
        "interrupts": {
                "count": 2971.849670,
                "unit": "irq/s"
        },
        "rc6": {
                "value": 0.000000,
                "unit": "%"
        },
        "engines": {
                "Render/3D/0": {
                        "busy": 0.000000,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                },
                "Blitter/0": {
                        "busy": 0.000000,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                },
                "Video/0": {
                        "busy": 99.270775,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                },
                "Video/1": {
                        "busy": 51.775195,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                },
                "VideoEnhance/0": {
                        "busy": 0.000000,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                },
                "VideoEnhance/1": {
                        "busy": 0.000000,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                },
                "[unknown]/0": {
                        "busy": 12.718176,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                }
        },
        "clients": {
                "4294005725": {
                        "name": "ffmpeg",
                        "pid": "961571",
                        "engine-classes": {
                                "Render/3D": {
                                        "busy": "0.000000",
                                        "unit": "%"
                                },
                                "Blitter": {
                                        "busy": "0.000000",
                                        "unit": "%"
                                },
                                "Video": {
                                        "busy": "2.226429",
                                        "unit": "%"
                                },
                                "VideoEnhance": {
                                        "busy": "0.000000",
                                        "unit": "%"
                                },
                                "[unknown]": {
                                        "busy": "0.000000",
                                        "unit": "%"
                                }
                        }
                }
        }
},

https://manpages.debian.org/testing/intel-gpu-tools/intel_gpu_top.1.en.html

henrygd · 2024-11-09T19:08:28Z

Maybe xpu-smi is a better option. intel_gpu_top isn't made by Intel and doesn't seem to have been updated in many years.

SonGokussj4 · 2024-11-10T13:00:34Z

How do I do this if I'm running beszel-agent as a Docker service through docker-compose.yaml?

 WARN GPU err="no GPU found - install nvidia-smi or rocm-smi"

# cat docker-compose.yaml
services:
 beszel-agent:
   image: "henrygd/beszel-agent"
   container_name: "beszel-agent"
   restart: unless-stopped
   network_mode: host
   volumes:
     - /var/run/docker.sock:/var/run/docker.sock:ro
   environment:
     PORT: 45876
     KEY: "ssh-ed25519 AAAAC3NzaC1lZ......adrfOpvRdFLD6p"
     GPU: "true"

# nvidia-smi
Sun Nov 10 13:59:38 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:09:00.0 Off |                  N/A |
| 35%   38C    P8    14W / 215W |   1121MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:0A:00.0 Off |                  N/A |
| 33%   35C    P8     2W / 215W |      3MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   1472954      C   python                           1118MiB |
+-----------------------------------------------------------------------------+

Morethanevil · 2024-11-10T17:27:05Z

How do I do this if I'm running beszel-agent as a Docker service through docker-compose.yaml?

 WARN GPU err="no GPU found - install nvidia-smi or rocm-smi"

# cat docker-compose.yaml
services:
 beszel-agent:
   image: "henrygd/beszel-agent"
   container_name: "beszel-agent"
   restart: unless-stopped
   network_mode: host
   volumes:
     - /var/run/docker.sock:/var/run/docker.sock:ro
   environment:
     PORT: 45876
     KEY: "ssh-ed25519 AAAAC3NzaC1lZ......adrfOpvRdFLD6p"
     GPU: "true"

# nvidia-smi
Sun Nov 10 13:59:38 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:09:00.0 Off |                  N/A |
| 35%   38C    P8    14W / 215W |   1121MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:0A:00.0 Off |                  N/A |
| 33%   35C    P8     2W / 215W |      3MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   1472954      C   python                           1118MiB |
+-----------------------------------------------------------------------------+

You can't at the moment, see above

May it will be possible in future. The image needs to include nvidia-smi or rocm-smi, then it should be possible to mount the gpu like this:

services:
 beszel-agent:
   image: "henrygd/beszel-agent"
   container_name: "beszel-agent"
   restart: unless-stopped
   network_mode: host
   volumes:
     - /var/run/docker.sock:/var/run/docker.sock:ro
   deploy:
     resources:
       reservations:
         devices:
           - driver: nvidia
             count: all
             capabilities:
               - gpu
   environment:
     PORT: 45876
     KEY: "ssh-ed25519 AAAAC3NzaC1lZ......adrfOpvRdFLD6p"
     GPU: "true"

NVIDIA container toolkit should be installed.

henrygd · 2024-11-10T19:07:09Z

I don't plan to include nvidia-smi or rocm-smi in the Docker image because it will increase the size many times over for a feature that not everyone will use.

But I can add example dockerfiles and compose configs. Maybe even build them automatically as separate images or tags.

Morethanevil · 2024-11-10T19:10:07Z

I don't plan to include nvidia-smi or rocm-smi in the Docker image because it will increase the size many times over for a feature that not everyone will use.

But I can add example dockerfiles and compose configs. Maybe even build them automatically as separate images or tags.

As separate images would be the best way 👍🏻
Latest-cuda, latest-rocm etc would be cool

SonGokussj4 · 2024-11-11T08:30:29Z

@Morethanevil This didn't work for me. I have container toolkit installed, we are using GPU in our other applications deployed by docker-compose and the deploy section is used the same way.
Here, it still prints the warning.

services:
  beszel-agent:
    image: "henrygd/beszel-agent"
    container_name: "beszel-agent"
    restart: unless-stopped
    network_mode: host
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    environment:
      PORT: 45876
      KEY: "ssh-ed25519 AAAAC3Nz.....fOpvRdFLD6p"
      GPU: "true"
      # FILESYSTEM: /dev/sda1 # set to the correct filesystem for disk I/O stats
      #FILESYSTEM: data
    deploy:
     resources:
       reservations:
         devices:
           - driver: nvidia
             count: all
             capabilities:
               - gpu
               
# docker compose down; docker compose up -d; docker compose logs -f
[+] Running 1/0
 ✔ Container beszel-agent  Removed                                                                                 0.1s
[+] Running 1/1
 ✔ Container beszel-agent  Started                                                                                 0.0s
beszel-agent  | 2024/11/11 08:29:40 INFO Detected root device name=nvme0n1p1
beszel-agent  | 2024/11/11 08:29:40 INFO Detected network interface name=enp5s0 sent=446766673405 recv=7705963243348
beszel-agent  | 2024/11/11 08:29:40 WARN GPU err="no GPU found - install nvidia-smi or rocm-smi"
beszel-agent  | 2024/11/11 08:29:40 INFO Starting SSH server address=:45876

So it can't be done if it's not build as separated images?

Morethanevil · 2024-11-11T09:15:02Z

@Morethanevil This didn't work for me. I have container toolkit installed, we are using GPU in our other applications deployed by docker-compose and the deploy section is used the same way. Here, it still prints the warning.

services:
  beszel-agent:
    image: "henrygd/beszel-agent"
    container_name: "beszel-agent"
    restart: unless-stopped
    network_mode: host
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    environment:
      PORT: 45876
      KEY: "ssh-ed25519 AAAAC3Nz.....fOpvRdFLD6p"
      GPU: "true"
      # FILESYSTEM: /dev/sda1 # set to the correct filesystem for disk I/O stats
      #FILESYSTEM: data
    deploy:
     resources:
       reservations:
         devices:
           - driver: nvidia
             count: all
             capabilities:
               - gpu
               
# docker compose down; docker compose up -d; docker compose logs -f
[+] Running 1/0
 ✔ Container beszel-agent  Removed                                                                                 0.1s
[+] Running 1/1
 ✔ Container beszel-agent  Started                                                                                 0.0s
beszel-agent  | 2024/11/11 08:29:40 INFO Detected root device name=nvme0n1p1
beszel-agent  | 2024/11/11 08:29:40 INFO Detected network interface name=enp5s0 sent=446766673405 recv=7705963243348
beszel-agent  | 2024/11/11 08:29:40 WARN GPU err="no GPU found - install nvidia-smi or rocm-smi"
beszel-agent  | 2024/11/11 08:29:40 INFO Starting SSH server address=:45876

So it can't be done if it's not build as separated images?

As I said before: it is not possible at the moment with docker, since there is no nvidia-smi included. You need to wait for a separate image. I just provided an example how to include the GPU into the compose.yaml (docker-compose.yml)

SonGokussj4 · 2024-11-11T09:23:54Z

I see, sorry for the misunderstanding. Haven't got a coffee yet. (cheap excuse 😅)

eximo84 · 2024-11-11T21:27:18Z

Maybe xpu-smi is a better option. intel_gpu_top isn't made by Intel and doesn't seem to have been updated in many years.

Interesting. Didn't know this existed. Most guides point you towards jntel_gpu_top.

I'm happy to test but appreciate its tricky to code it you don't have the hardware.

henrygd · 2024-11-12T22:45:55Z

I think I was wrong about intel_gpu_top not being updated actually.

Does anyone know if it works with newer iGPUs and Arc cards?

If someone with Intel can look into this further and compare intel_gpu_top / xpu-smi that would be very helpful.

We need JSON or CSV output and ideally all the same info as Nvidia / AMD -- GPU name, utilization, VRAM usage, power draw, and temperature.

Maybe next week I'll have some time to try doing it blind with sample output.

Obamium69 · 2024-11-14T16:33:36Z

I'm running a Intel Arc A750 and I can read out the following informations as a JSON:

{
        "period": {
                "duration": 1000.351522,
                "unit": "ms"
        },
        "frequency": {
                "requested": 141.950101,
                "actual": 140.950453,
                "unit": "MHz"
        },
        "interrupts": {
                "count": 187.933937,
                "unit": "irq/s"
        },
        "rc6": {
                "value": 74.878730,
                "unit": "%"
        },
        "imc-bandwidth": {
                "reads": 2758.395964,
                "writes": 1349.216517,
                "unit": "MiB/s"
        },
        "engines": {
                "Render/3D/0": {
                        "busy": 0.000000,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                },
                "Blitter/0": {
                        "busy": 0.000000,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                },
                "Video/0": {
                        "busy": 5.275372,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                },
                "Video/1": {
                        "busy": 1.311127,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                },
                "VideoEnhance/0": {
                        "busy": 2.374794,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                },
                "VideoEnhance/1": {
                        "busy": 0.561587,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                },
                "[unknown]/0": {
                        "busy": 0.000000,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                }
        }
},

intel_gpu_top can't give you the exact name of the GPU. When running intel_gpu_top -L the output looks like this:

card1                    8086:56a1                         pci:vendor=8086,device=56A1,card=0
└─renderD128

eximo84 · 2024-11-15T10:07:38Z

I think I was wrong about intel_gpu_top not being updated actually.

Does anyone know if it works with newer iGPUs and Arc cards?

If someone with Intel can look into this further and compare intel_gpu_top / xpu-smi that would be very helpful.

We need JSON or CSV output and ideally all the same info as Nvidia / AMD -- GPU name, utilization, VRAM usage, power draw, and temperature.

Maybe next week I'll have some time to try doing it blind with sample output.

#262 (comment) - my output here from an Intel ARC A310

Jamy-L · 2024-12-26T11:25:50Z

I have been trying to run xpu-smi to provide a sample, without success. Most links on the intel install guides are dead and the install sounds like a massive headache. It must also be noted that few cards appear to be supported, because it targets data centers. intel_gpu_top sounds more reasonable and works with igpus as well. With my 12500 cpu, intel_gpu_top -L gives me:

card0                    Intel Alderlake_s (Gen12)         pci:vendor=8086,device=4690,card=0
└─renderD128

which is totally expected.

henrygd · 2024-12-26T21:27:11Z

@Jamy-L Thanks, let's strike xpu-smi then. Unfortunately intel_gpu_top isn't a great option for this either. Too bad there's not a simple nvidia-smi / rocm-smi equivalent for Intel.

Hilbsam · 2024-12-31T11:58:14Z

@henrygd I stumpled yesterday on your project and need that feature for multiple machines.

So here I made a fast and quick working image for nvidia gpu's:

FROM henrygd/beszel-agent:latest as beszel

# Stage 2: Final image with NVIDIA Toolkit
FROM nvidia/cuda:12.0.0-base-ubuntu20.04

# Copy the agent binary from the Beszel base image
COPY --from=beszel /agent /agent

# Install NVIDIA Toolkit
RUN apt-get update && \
    apt-get install -y --no-install-recommends nvidia-container-toolkit && \
    rm -rf /var/lib/apt/lists/*

# Set the entrypoint
ENTRYPOINT ["/agent"]

Here a quick example from my older server:

The Image as a size of 289MB, so yeah not the lightweight of 8MB anymore :')

If that aproach is okay for you I could look up ways for amd and intel gpu's

outpoints · 2024-12-31T22:43:10Z

Thank you for providing the Dockerfile for that image @Hilbsam! It's now working on my TrueNAS Scale install :)

henrygd · 2024-12-31T23:10:54Z

Thanks @Hilbsam, I will try this out and add it to the docs when I have time.

AMD has documentation for Running ROCm Docker containers, but all the official images appear to be gigantic. So I might try starting with a smaller base image and manually installing dependencies.

If you have any ideas for Intel that would be great, as I'm not sure what to do with that.

zachatrocity · 2025-01-04T02:31:29Z

This is great! Thanks for sharing @Hilbsam!

I was able to use this configuration too. If it helps with documentation here's my final compose setup:

version: "3.8"
services:
  beszel:
    image: henrygd/beszel:latest
    container_name: beszel
    restart: unless-stopped
    extra_hosts:
      - host.docker.internal:host-gateway
    ports:
      - 8090:8090
    volumes:
      - ./beszel_data:/beszel_data
  beszel-agent:
    build:
      context: /opt/stacks/beszel
      dockerfile: Dockerfile
    container_name: beszel-agent
    restart: unless-stopped
    network_mode: host
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities:
                - gpu
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      # monitor other disks / partitions by mounting a folder in /extra-filesystems
      # - /mnt/disk/.beszel:/extra-filesystems/sda1:ro
    environment:
      PORT: 45876
      KEY: REDACTED
networks: {}

With the Dockerfile from @Hilbsam above placed in the /opt/stacks/beszel folder.

SonGokussj4 · 2025-01-06T13:31:14Z

I'm having problem with this solution.
I tried the dockerfile + docker-compose resulting agent exitting on

$ docker compose up --remove-orphans
[+] Running 2/1
 ✔ Container beszel        Removed                                                                                 0.4s
 ✔ Container beszel-agent  Created                                                                                 0.0s
Attaching to beszel-agent
beszel-agent  | 2025/01/06 13:31:03 KEY environment variable is not set
beszel-agent exited with code 1

Seems like issue #369

henrygd · 2025-01-06T17:19:43Z

@SonGokussj4 Try putting quotes around the key value like this:

environment:
  KEY: "..."

If that doesn't work I will help you in #369. Just paste the compose content over there, thanks.

SonGokussj4 · 2025-01-06T21:56:35Z

Oh my. I'm so ashamed... I was copying the docker-compose file at work when I didn't pay much attention and forgot this section.

volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      # monitor other disks / partitions by mounting a folder in /extra-filesystems
      # - /mnt/disk/.beszel:/extra-filesystems/sda1:ro
    environment:
      PORT: 45876
      KEY: REDACTED

When I added this, with appropriate key (without quotes), it all works and I see GPUs! Nice! Thank you and sorry for the last post.

To have a complete post for anyone else, this is my setup. I've updated the count of GPUs from 1 to all
@Hilbsam's ~/beszel-agent/dockerfile

FROM henrygd/beszel-agent:latest as beszel

# Stage 2: Final image with NVIDIA Toolkit
FROM nvidia/cuda:12.0.0-base-ubuntu20.04

# Copy the agent binary from the Beszel base image
COPY --from=beszel /agent /agent

# Install NVIDIA Toolkit
RUN apt-get update && \
    apt-get install -y --no-install-recommends nvidia-container-toolkit && \
    rm -rf /var/lib/apt/lists/*

# Set the entrypoint
ENTRYPOINT ["/agent"]

@zachatrocity's ~/beszel-agent/docker-compose.yaml (only the agent part)

$ cat docker-compose.yaml
services:
  beszel-agent:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: beszel-agent
    restart: unless-stopped
    network_mode: host
    runtime: nvidia
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    environment:
      PORT: 45876
      KEY: ssh-ed25519 AAAAC3...aJ34gQE+
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all  # <---- We have 1 or 2 or 3 GPUs in our servers
              capabilities:
                - gpu

SonGokussj4 · 2025-01-06T22:34:05Z

But surprisingly with the exact dockerfile and docker-compose.yaml file
on one server, i can see the GPUs, on the second, I can't.

I can list them on both servers with nvidia-smi

infra@ais60 ~/beszel-agent $ nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 1070 Ti (UUID: GPU-632b2c97-8048-2882-6c10-1a941c9d2848)
GPU 1: NVIDIA GeForce GTX 1070 (UUID: GPU-e0aa625f-8a84-632a-0681-ca901d690116)

(honza@rd-dl1) - (~/beszel-agent) $ nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-8ec8eb4a-286e-26db-c3db-4356b7857f5a)
GPU 1: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-ad809787-dd9e-3af5-f00b-16224ac4130f)

But the ais60 is not showing GPUs in beszel

edit: I have 2 servers where it works and i see GPUs in beszel and 2 where it isn't. Still didn't find why.

henrygd · 2025-01-06T23:05:29Z

@SonGokussj4 Here's the command we use. See if it gives you anything strange on those machines.

nvidia-smi -l 4 --query-gpu=index,name,temperature.gpu,memory.used,memory.total,utilization.gpu,power.draw --format=csv,noheader,nounits

Also check the agent logs if you haven't already to see if it's printing errors.

SonGokussj4 · 2025-01-06T23:13:49Z

Working system

$ nvidia-smi -l 4 --query-gpu=index,name,temperature.gpu,memory.used,memory.total,utilization.gpu,power.draw --format=csv,noheader,nounits
0, NVIDIA GeForce RTX 2070 SUPER, 37, 573, 8192, 0, 14.73
1, NVIDIA GeForce RTX 2070 SUPER, 34, 6383, 8192, 0, 3.51

$ docker compose logs -f
beszel-agent  | 2025/01/06 22:05:18 INFO Detected root device name=sdb2
beszel-agent  | 2025/01/06 22:05:18 INFO Detected network interface name=eno1 sent=91388882432 recv=1469910483390
beszel-agent  | 2025/01/06 22:05:18 INFO Starting SSH server address=:45876

Second working system

$ nvidia-smi -l 4 --query-gpu=index,name,temperature.gpu,memory.used,memory.total,utilization.gpu,power.draw --format=csv,noheader,nounits
0, NVIDIA GeForce RTX 2070 SUPER, 37, 573, 8192, 0, 14.70
1, NVIDIA GeForce RTX 2070 SUPER, 34, 6383, 8192, 0, 3.55

$ docker compose logs -f
beszel-agent  | 2025/01/06 23:19:11 INFO Detected root device name=nvme0n1p1
beszel-agent  | 2025/01/06 23:19:11 INFO Detected network interface name=enp5s0 sent=989189675184 recv=20938608783622
beszel-agent  | 2025/01/06 23:19:11 INFO Docker 24.0.6 is outdated. Upgrade if possible. See https://github.com/henrygd/beszel/issues/58
beszel-agent  | 2025/01/06 23:19:11 INFO Starting SSH server address=:45876

Problematic system

$ nvidia-smi -l 4 --query-gpu=index,name,temperature.gpu,memory.used,memory.total,utilization.gpu,power.draw --format=csv,noheader,nounits
0, NVIDIA GeForce GTX 1070 Ti, 34, 2, 8192, 0, 12.27
1, NVIDIA GeForce GTX 1070, 37, 2, 8192, 0, 11.31

$ docker compose logs -f
beszel-agent  | 2025/01/06 23:06:13 INFO Detected root device name=sda2
beszel-agent  | 2025/01/06 23:06:13 INFO Detected network interface name=eno1 sent=4934451931 recv=34122333223
beszel-agent  | 2025/01/06 23:06:13 INFO Docker 24.0.7 is outdated. Upgrade if possible. See https://github.com/henrygd/beszel/issues/58
beszel-agent  | 2025/01/06 23:06:13 INFO Starting SSH server address=:45876

henrygd · 2025-01-06T23:25:28Z

Thanks for the output, I'll investigate tomorrow. It continues running and printing data every four seconds on the problematic system, right?

SonGokussj4 · 2025-01-06T23:27:40Z

Yes.

$ nvidia-smi -l 4 --query-gpu=index,name,temperature.gpu,memory.used,memory.total,utilization.gpu,power.draw --format=csv,noheader,nounits
0, NVIDIA GeForce GTX 1070 Ti, 34, 2, 8192, 0, 12.52
1, NVIDIA GeForce GTX 1070, 37, 2, 8192, 0, 11.21
0, NVIDIA GeForce GTX 1070 Ti, 34, 2, 8192, 0, 12.28
1, NVIDIA GeForce GTX 1070, 37, 2, 8192, 0, 11.41
0, NVIDIA GeForce GTX 1070 Ti, 34, 2, 8192, 0, 12.52
1, NVIDIA GeForce GTX 1070, 37, 2, 8192, 0, 11.31
0, NVIDIA GeForce GTX 1070 Ti, 34, 2, 8192, 0, 12.28
1, NVIDIA GeForce GTX 1070, 37, 2, 8192, 0, 11.31
0, NVIDIA GeForce GTX 1070 Ti, 34, 2, 8192, 0, 12.52
1, NVIDIA GeForce GTX 1070, 37, 2, 8192, 0, 12.20
0, NVIDIA GeForce GTX 1070 Ti, 34, 2, 8192, 0, 12.42
1, NVIDIA GeForce GTX 1070, 37, 2, 8192, 0, 13.98

I'm now trying to update the docker version. But it's a work computer so not sure how it will go :-)

Edit:

$ docker --version
Docker version 27.4.1, build b9d17ea

$ docker compose logs -f
beszel-agent  | 2025/01/06 23:32:55 INFO Detected root device name=nvme0n1p2
beszel-agent  | 2025/01/06 23:32:55 INFO Detected network interface name=eno1 sent=992129 recv=74292416
beszel-agent  | 2025/01/06 23:32:55 INFO Starting SSH server address=:45876

$ nvidia-smi -l 4 --query-gpu=index,name,temperature.gpu,memory.used,memory.total,utilization.gpu,power.draw --format=csv,noheader,nounits
0, NVIDIA GeForce RTX 2060 SUPER, 32, 1, 8192, 0, 9.40
0, NVIDIA GeForce RTX 2060 SUPER, 32, 1, 8192, 0, 9.50
0, NVIDIA GeForce RTX 2060 SUPER, 32, 1, 8192, 0, 9.50

After docker update, still no GPU in the baszel dashboard.
I've tried to remove and readd the agent through beszel hub. Same, no GPU.

Hilbsam · 2025-01-07T05:33:53Z

@henrygd I took a look at the xpu-smi. Do you want to support consumer, enterprise or both card types?
Because xpu-smi is only for enterprise cards.

henrygd · 2025-01-07T18:36:43Z

@Hilbsam That was an oversight on my part. We won't use xpu-smi.

Looks like intel_gpu_top is the only viable option, so we'll pull frequency and add a chart for that. Neither sample output in this thread has has power info, but it seems that some cards do provide power through intel_gpu_top, so we'll pull that as well if available.

VinterSolen · 2025-01-15T22:40:40Z

If i add

deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all  # <---- We have 1 or 2 or 3 GPUs in our servers
              capabilities:
                - gpu

to my docker file, the agent doesnt start.

What am i missing?


> docker compose up -d
[+] Running 1/2
 ✔ Container beszel        Running                                                                                                                                                                  0.0s 
 ⠙ Container beszel-agent  Starting                                                                                                                                                                 0.1s 
Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]]

> nvidia-smi -l 4 --query-gpu=index,name,temperature.gpu,memory.used,memory.total,utilization.gpu,power.draw --format=csv,noheader,nounits
0, NVIDIA GeForce RTX 4070, 48, 1, 12282, 0, 16.85
0, NVIDIA GeForce RTX 4070, 48, 1, 12282, 0, 16.91
0, NVIDIA GeForce RTX 4070, 48, 1, 12282, 0, 16.75
0, NVIDIA GeForce RTX 4070, 48, 1, 12282, 0, 16.89
0, NVIDIA GeForce RTX 4070, 48, 1, 12282, 0, 16.91
0, NVIDIA GeForce RTX 4070, 48, 1, 12282, 0, 16.95

jcv- · 2025-01-16T01:08:44Z

Can open up another issue if necessary, but just installed this on Ubuntu 24.04 host with an AMD GPU and noticed that the power draw seems to be stuck loading. The memory metrics are working.

rocm-smi is installed and available and does report AvgPwr.

The video card is an AMD 7900xtx

henrygd · 2025-01-16T01:24:27Z

@VinterSolen Have you tried installing or updating NVIDIA Container Toolkit? Maybe try restarting Docker if you just installed.

@jcv- Please open a new issue so it's searchable if someone runs into the same thing. Include output for the command below. Thanks!

rocm-smi --showtemp --showuse --showpower --showmeminfo vram --json

VinterSolen · 2025-01-16T09:19:11Z

@VinterSolen Have you tried installing or updating NVIDIA Container Toolkit? Maybe try restarting Docker if you just installed.

Just tried. After restart of docker, it seems the agent starts, but no gpu related info shows on the webpage.

Log of beszel-agent:

2025/01/16 09:15:59 INFO Detected root device name=md0

2025/01/16 09:15:59 WARN Device not found in diskstats name=tank

2025/01/16 09:15:59 INFO Detected network interface name=enp5s0f0 sent=337192805171 recv=2424104878101

2025/01/16 09:15:59 INFO Detected network interface name=wt0 sent=75163590216 recv=1052411692

2025/01/16 09:15:59 INFO Starting SSH server address=:45876

2025/01/16 09:18:43 INFO Detected root device name=md0

2025/01/16 09:18:43 WARN Device not found in diskstats name=tank

2025/01/16 09:18:43 INFO Detected network interface name=enp5s0f0 sent=337292966383 recv=2424200729346

2025/01/16 09:18:43 INFO Detected network interface name=wt0 sent=75173918872 recv=1052966076

2025/01/16 09:18:43 INFO Starting SSH server address=:45876

Hilbsam · 2025-01-16T12:21:26Z

@VinterSolen I'm just going to refer back to an old comment of mine. Please try to run the offical docker gpu container to check if docker can pass through your gpu. #262

VinterSolen · 2025-01-16T16:01:43Z

@VinterSolen I'm just going to refer back to an old comment of mine. Please try to run the offical docker gpu container to check if docker can pass through your gpu. #262

This is my output from that:

Digest: sha256:59261e419d6d48a772aad5bb213f9f1588fcdb042b115ceb7166c89a51f03363
Status: Downloaded newer image for nvcr.io/nvidia/k8s/cuda-sample:nbody
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
        -fullscreen       (run n-body simulation in fullscreen mode)
        -fp64             (use double precision floating point values for simulation)
        -hostmem          (stores simulation data in host memory)
        -benchmark        (run benchmark to measure performance) 
        -numbodies=<N>    (number of bodies (>= 1) to run in simulation) 
        -device=<d>       (where d=0,1,2.... for the CUDA device to use)
        -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
        -compare          (compares simulation results running once on the default GPU and once on the CPU)
        -cpu              (run n-body simulation on the CPU)
        -tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
MapSMtoCores for SM 8.9 is undefined.  Default to use 128 Cores/SM
MapSMtoArchName for SM 8.9 is undefined.  Default to use Ampere
GPU Device 0: "Ampere" with compute capability 8.9

> Compute 8.9 CUDA device: [NVIDIA GeForce RTX 4070]
47104 bodies, total time for 10 iterations: 27.331 ms
= 811.834 billion interactions per second
= 16236.673 single-precision GFLOP/s at 20 flops per interaction
[stefan@Hades10Gbe] [~/beszel] ○

Which makes me think it should work.

henrygd · 2025-01-16T16:57:32Z

@VinterSolen Can you double check that you saved the dockerfile and include the build instructions? It should be saved as Dockerfile in the same directory as the docker compose file.

services:
  beszel-agent:
    build:
      context: .
      dockerfile: Dockerfile

I just tested myself and using simply build: . works for me. And I don't need runtime: nvidia specified.

VinterSolen · 2025-01-18T10:26:47Z

@VinterSolen Can you double check that you saved the dockerfile and include the build instructions? It should be saved as Dockerfile in the same directory as the docker compose file.

services:
beszel-agent:
build:
context: .
dockerfile: Dockerfile
I just tested myself and using simply build: . works for me. And I don't need runtime: nvidia specified.

I have this in the same docker-compose.yml file as the rest of the beszel one.


services:
  beszel:
    image: henrygd/beszel:latest
    container_name: beszel
    restart: unless-stopped
    extra_hosts:
      - host.docker.internal:host-gateway
    ports:
      - 8090:8090
    volumes:
      - ./beszel_data:/beszel_data



 beszel-agent:
    image: henrygd/beszel-agent:latest
    container_name: beszel-agent
    restart: unless-stopped
    network_mode: host
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /data/:/extra-filesystems/data:ro
    environment:
      PORT: 45876
      # Do not remove quotes around the key
      KEY: 'yyy'
    build:
      context: .
      dockerfile: Dockerfile
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all  # <---- We have 1 or 2 or 3 GPUs in our servers
              capabilities:
                - gpu

Is this wrong way?

Hilbsam · 2025-01-18T13:00:46Z

First things first, please do not share the KEY with anyone! 🥲

And yes please update your docker-compose.yml. The current offical image doesn't support the gpu as it is mentioned in the docs. I shared a working Dockerfile for an bezel agent. Also I saw that in your file you have build and image in the same container... If you do
./docker-compose up -d
it won't work because it'll fetch from henrygd/beszel-agent:latest... But ./docker-compose up -d --build would work, as the docker deamon builds the image henrygd/beszel-agent:latest locally. If you've done with --build it will from now on work with ./docker-compose up -d as the image is locally fetched and ready. (Please don't do that, as it is not the best practice)

My recommendation is to update your docker-compose.yml file:

services:
  beszel:
    image: henrygd/beszel:latest
    container_name: beszel
    restart: unless-stopped
    extra_hosts:
      - host.docker.internal:host-gateway
    ports:
      - 8090:8090
    volumes:
      - ./beszel_data:/beszel_data

 beszel-agent:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: beszel-agent
    restart: unless-stopped
    network_mode: host
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /data/:/extra-filesystems/data:ro
    environment:
      PORT: 45876
      # Do not remove quotes around the key
      KEY: 'YOUR_SECRET_KEY_GOES_HERE'

    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all  
              capabilities:
                - gpu

Also check if your Dockerfile is correct:

FROM henrygd/beszel-agent:latest as beszel

# Stage 2: Final image with NVIDIA Toolkit
FROM nvidia/cuda:12.0.0-base-ubuntu20.04

# Copy the agent binary from the Beszel base image
COPY --from=beszel /agent /agent

# Install NVIDIA Toolkit
RUN apt-get update && \
    apt-get install -y --no-install-recommends nvidia-container-toolkit && \
    rm -rf /var/lib/apt/lists/*

# Set the entrypoint
ENTRYPOINT ["/agent"]

@henrygd I think I'll do a documentation and little Q&A when I get time on your GPU Monitoring if its alright.

VinterSolen · 2025-01-18T13:06:48Z

First things first, please do not share the KEY with anyone! 🥲

And yes please update your docker-compose.yml. The current offical image doesn't support the gpu as it is mentioned in the docs. I shared a working Dockerfile for an bezel agent. Also I saw that in your file you have build and image in the same container... If you do ./docker-compose up -d it won't work because it'll fetch from henrygd/beszel-agent:latest... But ./docker-compose up -d --build would work, as the docker deamon builds the image henrygd/beszel-agent:latest locally. If you've done with --build it will from now on work with ./docker-compose up -d as the image is locally fetched and ready. (Please don't do that, as it is not the best practice)

My recommendation is to update your docker-compose.yml file:
services:
  beszel:
    image: henrygd/beszel:latest
    container_name: beszel
    restart: unless-stopped
    extra_hosts:
      - host.docker.internal:host-gateway
    ports:
      - 8090:8090
    volumes:
      - ./beszel_data:/beszel_data

 beszel-agent:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: beszel-agent
    restart: unless-stopped
    network_mode: host
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /data/:/extra-filesystems/data:ro
    environment:
      PORT: 45876
      # Do not remove quotes around the key
      KEY: 'YOUR_SECRET_KEY_GOES_HERE'

    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all  
              capabilities:
                - gpu
Also check if your Dockerfile is correct:
FROM henrygd/beszel-agent:latest as beszel

# Stage 2: Final image with NVIDIA Toolkit
FROM nvidia/cuda:12.0.0-base-ubuntu20.04

# Copy the agent binary from the Beszel base image
COPY --from=beszel /agent /agent

# Install NVIDIA Toolkit
RUN apt-get update && \
    apt-get install -y --no-install-recommends nvidia-container-toolkit && \
    rm -rf /var/lib/apt/lists/*

# Set the entrypoint
ENTRYPOINT ["/agent"]
@henrygd I think I'll do a documentation and little Q&A when I get time on your GPU Monitoring if its alright.

This made it work, missed the build-thing. thanks.

arunoruto · 2025-01-18T19:04:52Z

Just came across Beszel from the tailscale video. We are currently using nvtop locally to inspect the GPU usage while computing something. While the project started with Nvidia support, it obtained support for other GPU devices over time (including intel).
Currently it only offers a GUI displaying the information, but there has been an issue open to enable dumping the data into a file. So maybe keep an eye out for it :)

henrygd · 2025-01-18T19:17:50Z

@Hilbsam Absolutely! That would be a big help. I'll try to get Intel support in the next release or two.

@arunoruto Interesting, thanks! I'll keep an eye on it.

cammurray · 2025-01-19T02:33:25Z

Watching this because I use intel GPU's for hardware acceleration in frigate, so it would be nice to get the usage. Thanks for this!

henrygd added the enhancement New feature or request label Nov 5, 2024

henrygd added the in progress We've started work on this label Nov 9, 2024

henrygd mentioned this issue Jan 6, 2025

GPUs showing on some systems but not others #375

Closed

GPU Stats #262

GPU Stats #262

Comments

eximo84 commented Nov 5, 2024

henrygd commented Nov 6, 2024

henrygd commented Nov 9, 2024 • edited Loading

Morethanevil commented Nov 9, 2024

eximo84 commented Nov 9, 2024

eximo84 commented Nov 9, 2024

henrygd commented Nov 9, 2024

SonGokussj4 commented Nov 10, 2024

Morethanevil commented Nov 10, 2024

henrygd commented Nov 10, 2024

Morethanevil commented Nov 10, 2024

SonGokussj4 commented Nov 11, 2024

Morethanevil commented Nov 11, 2024

SonGokussj4 commented Nov 11, 2024

eximo84 commented Nov 11, 2024

henrygd commented Nov 12, 2024

Obamium69 commented Nov 14, 2024

eximo84 commented Nov 15, 2024

Jamy-L commented Dec 26, 2024

henrygd commented Dec 26, 2024

Hilbsam commented Dec 31, 2024 • edited Loading

outpoints commented Dec 31, 2024

henrygd commented Dec 31, 2024

zachatrocity commented Jan 4, 2025

SonGokussj4 commented Jan 6, 2025 • edited Loading

henrygd commented Jan 6, 2025

SonGokussj4 commented Jan 6, 2025

SonGokussj4 commented Jan 6, 2025 • edited Loading

henrygd commented Jan 6, 2025 • edited Loading

SonGokussj4 commented Jan 6, 2025 • edited Loading

Working system

Second working system

Problematic system

henrygd commented Jan 6, 2025

SonGokussj4 commented Jan 6, 2025 • edited Loading

Hilbsam commented Jan 7, 2025

henrygd commented Jan 7, 2025

VinterSolen commented Jan 15, 2025 • edited Loading

jcv- commented Jan 16, 2025

henrygd commented Jan 16, 2025

VinterSolen commented Jan 16, 2025

Hilbsam commented Jan 16, 2025

VinterSolen commented Jan 16, 2025

henrygd commented Jan 16, 2025

VinterSolen commented Jan 18, 2025 • edited Loading

Hilbsam commented Jan 18, 2025

VinterSolen commented Jan 18, 2025

arunoruto commented Jan 18, 2025

henrygd commented Jan 18, 2025

cammurray commented Jan 19, 2025

henrygd commented Nov 9, 2024 •

edited

Loading

Hilbsam commented Dec 31, 2024 •

edited

Loading

SonGokussj4 commented Jan 6, 2025 •

edited

Loading

SonGokussj4 commented Jan 6, 2025 •

edited

Loading

henrygd commented Jan 6, 2025 •

edited

Loading

SonGokussj4 commented Jan 6, 2025 •

edited

Loading

SonGokussj4 commented Jan 6, 2025 •

edited

Loading

VinterSolen commented Jan 15, 2025 •

edited

Loading

VinterSolen commented Jan 18, 2025 •

edited

Loading