Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Infinite loop if /sys/class/drm/cardX not present (new with 6.12 kernel and nvidia drm) #1002

Open
drankinatty opened this issue Dec 31, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@drankinatty
Copy link

drankinatty commented Dec 31, 2024

Read the README.md and search for similar issues before posting a bug report!

Any bug that can be solved by just reading the prerequisites section of the README will likely be ignored.

Describe the bug

The Linux 6.12 kernel removed support for Nvidia kms drm, instead relying on kernel drm. For legacy nvidia cards unless nvidia-drm.modeset=1 is provided as a kernel parameter, /sys/class/drm/cardX/ is not created. When btop is started, it goes into an infinite loop incrementing the cardX count by 1 looking for /sys/class/drm/card0/device/vendor, .../card1/device/vendor, etc.... locking the terminal.

Example strace output:

newfstatat(AT_FDCWD, "/sys/class/drm/card0/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card1/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card2/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card3/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card4/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card5/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card6/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card7/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card8/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card9/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card10/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card11/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card12/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card13/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card14/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card15/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card16/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card17/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card18/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card19/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card20/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card21/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card22/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card23/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card24/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card25/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card26/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card27/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card28/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card29/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card30/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card31/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card32/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card33/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card34/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card35/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card36/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card37/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card38/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card39/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card40/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card41/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card42/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card43/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card44/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card45/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card46/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card47/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card48/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card49/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card50/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card51/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card52/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card53/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card54/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card55/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card56/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card57/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card58/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card59/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card60/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card61/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card62/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card63/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card64/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card65/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card66/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card67/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card68/device/vendor", 0x7ffc89540860, 0) = -1 ENOENT (No such file or directory)
...

To Reproduce

Attempt to start btop on any computer where /sys/class/drm/cardX isn't present (where X is 0, 1, 2, .... The infinite loop occurs.

Expected behavior

btop should validate that at least one /sys/class/drm/cardX is present on the system or handle the error without infinitely trying to increment X by one only to fail again. As a rough hack a simple counter could be placed after the file open attempt with the open attempt wrapped in a loop with a sane limit. E.g. even a quick #define CARDMAX 100 and then a simple unsigned opencount = 0; while (opencount < CARDMAX) { /* attempt open and increment X */; opencount += 1; } would do. Just something to stop the infinite loop.

I stumbled into this after patching the Nvidia 390 driver for the 6.12 kernel. The same will apply to the 470 driver (and likely the 535 driver but I can't test that). Prior to 6.12 the kernel included code to handle setup of cardX for the driver. From 6.12 on, unless the kernel parameter is provided, no cardX is created -- which will lead to this infinite loop in btop until the user identifies the issue.

Screenshots

[If applicable, add screenshots to help explain your problem.]

Info (please complete the following information):

  • btop++ version: btop --version btop-1.4.0+git20241108.e17bc6b-1.1.x86_64
    • If using snap: snap info btop nope
  • Binary: [self compiled or static binary from release] openSUSE Tumbleweed
  • Architecture: [x86_64, aarch64, etc.] uname -m .x86_64
  • Platform: [Linux, FreeBSD, OsX] Linux
  • (Linux) Kernel: uname -r 6.12.6-1-default
  • (OSX/FreeBSD) Os release version:
  • Terminal used: xterm and konsole
  • Font used: default

Additional context

Contents of ~/.config/btop/btop.log

Not really relevant, the loop prevents any additional log entries. No change in log even if --debug given, just the normal complaints:

2024/12/29 (01:38:56) | ===> btop++ v.1.4.0
2024/12/29 (01:38:56) | WARNING: NVML: Failed to get maximum GPU power draw, defaulting to 225W: Not Supported
2024/12/29 (01:38:56) | WARNING: NVML: Failed to get maximum GPU temperature, defaulting to 110°C: Not Supported
2024/12/29 (01:38:56) | WARNING: NVML: Failed to get GPU utilization: Not Supported
2024/12/29 (01:38:56) | WARNING: NVML: Failed to get PCIe RX throughput: Not Supported
2024/12/29 (01:38:56) | WARNING: NVML: Failed to get PCIe TX throughput: Not Supported
2024/12/29 (01:38:56) | WARNING: NVML: Failed to get GPU clock speed: Not Supported
2024/12/29 (01:38:56) | WARNING: NVML: Failed to get VRAM clock speed: Not Supported
2024/12/29 (01:38:56) | WARNING: NVML: Failed to get GPU power usage: Not Supported

2024/12/29 (01:39:30) | ===> btop++ v.1.4.0
2024/12/29 (01:39:30) | DEBUG: Running in DEBUG mode!
2024/12/29 (01:39:30) | INFO: Logger set to DEBUG
2024/12/29 (01:39:30) | DEBUG: Setting LC_ALL=en_US.UTF-8
2024/12/29 (01:39:30) | INFO: Running on /dev/pts/13
2024/12/29 (01:39:30) | WARNING: NVML: Failed to get maximum GPU power draw, defaulting to 225W: Not Supported
2024/12/29 (01:39:30) | WARNING: NVML: Failed to get maximum GPU temperature, defaulting to 110°C: Not Supported
2024/12/29 (01:39:30) | WARNING: NVML: Failed to get GPU utilization: Not Supported
2024/12/29 (01:39:30) | WARNING: NVML: Failed to get PCIe TX throughput: Not Supported
2024/12/29 (01:39:30) | WARNING: NVML: Failed to get PCIe RX throughput: Not Supported
2024/12/29 (01:39:30) | WARNING: NVML: Failed to get GPU clock speed: Not Supported
2024/12/29 (01:39:30) | WARNING: NVML: Failed to get VRAM clock speed: Not Supported
2024/12/29 (01:39:30) | WARNING: NVML: Failed to get GPU power usage: Not Supported

2024/12/31 (02:27:08) | ===> btop++ v.1.4.0
2024/12/31 (02:27:08) | WARNING: NVML: Failed to get maximum GPU power draw, defaulting to 225W: Not Supported
2024/12/31 (02:27:08) | WARNING: NVML: Failed to get maximum GPU temperature, defaulting to 110°C: Not Supported
2024/12/31 (02:27:08) | WARNING: NVML: Failed to get GPU utilization: Not Supported
2024/12/31 (02:27:08) | WARNING: NVML: Failed to get PCIe TX throughput: Not Supported
2024/12/31 (02:27:08) | WARNING: NVML: Failed to get PCIe RX throughput: Not Supported
2024/12/31 (02:27:08) | WARNING: NVML: Failed to get GPU clock speed: Not Supported
2024/12/31 (02:27:08) | WARNING: NVML: Failed to get VRAM clock speed: Not Supported
2024/12/31 (02:27:08) | WARNING: NVML: Failed to get GPU power usage: Not Supported

Note: The snap uses: ~/snap/btop/current/.config/btop

(try running btop with --debug flag if btop.log is empty)

GDB Backtrace

strace output provided above.

@drankinatty drankinatty added the bug Something isn't working label Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants