-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[L4T 36.4.2]jetson orin nano super failed to start using jetson-containers run $(autotag nanoowl)? #772
Comments
Hi @EESN-W , check if you can other GPU containers, or is it specific to if you keep getting that error elsewhere, it may be unrelated to the container itself, and your docker daemon may need reinstalled. It seems to be missing some of the mounted driver files. |
I'm seeing the same problem with nvidia-container-toolkit 1.17.3, AGX Orin 64GB with everything upgraded to the latest Jetpack 6.1 or better:
No problems at all if I downgrade to 1.16.2. |
Ok, i take it that behavior is not specific to NanoOWL container?
If you upgrade docker daemon version or nvidia-container-runtime version, it often messes with the driver mounts (these are listed under /etc/nvidia-container-runtime/host-mounts-for-container.d)
For this reason , it is recommended either to pin or not upgrade those packages , or be prepared to reinstall/update the other necessary components.
…________________________________
From: Randy Gobbel ***@***.***>
Sent: Sunday, January 12, 2025 12:49:32 PM
To: dusty-nv/jetson-containers ***@***.***>
Cc: Dustin Franklin ***@***.***>; Comment ***@***.***>
Subject: Re: [dusty-nv/jetson-containers] [L4T 36.4.2]jetson orin nano super failed to start using jetson-containers run $(autotag nanoowl)? (Issue #772)
I'm seeing the same problem with nvidia-container-toolkit 1.17.3, AGX Orin 64GB with everything upgraded to the latest Jetpack 6.1 or better:
Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running createContainer hook #1: exit status 1, stdout: , stderr: time="2025-01-12T09:39:37-08:00" level=info msg="Symlinking /var/lib/docker/overlay2/cab97a8f0f4d243c89ba84ac2925460b9207c2419dbf8faab2015a5ed32d2a08/merged/etc/vulkan/icd.d/nvidia_icd.json to /usr/lib/aarch64-linux-gnu/nvidia/nvidia_icd.json"
time="2025-01-12T09:39:37-08:00" level=error msg="failed to create link [/usr/lib/aarch64-linux-gnu/nvidia/nvidia_icd.json /etc/vulkan/icd.d/nvidia_icd.json]: failed to create symlink: failed to remove existing file: remove /var/lib/docker/overlay2/cab97a8f0f4d243c89ba84ac2925460b9207c2419dbf8faab2015a5ed32d2a08/merged/etc/vulkan/icd.d/nvidia_icd.json: device or resource busy": unknown
Error: failed to start containers: cosmos_container2
No problems at all if I downgrade to 1.16.2.
—
Reply to this email directly, view it on GitHub<#772 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADVEGK6ZRLNHY5NYP6VV6LT2KKTKZAVCNFSM6AAAAABU3UGWPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBVHA2DQOBVGQ>.
You are receiving this because you commented.Message ID: ***@***.***>
|
Hi Dusty, thanks again for your awesome work and efforts. Not sure it's the place, but are there new Docker images in the pipe for the Nano Super? BR, |
Hi Matan, I would use dustynv/llama-factory:r36.4.0 for general Transformers image, because it includes the common add-on libraries for HF like FlashAttention, bitsandbytes, AutoGPTQ, ect.
If you still need just base transformers:r36.4.0, I can build it for you, but that is a simpler image that you should be able to build (it will install pytorch from wheel)
jetson-inference is unfortunately not compatible with TRT10 , support for many of the old caffe/tensorflow models it uses were removed. jetson-utils still works fine on JP 6.1. I would recommend migrating to Ultralytics yolo for object detection (it is well optimized for TRT) or pytorch/TIMM for general vision models and VITs.
…________________________________
From: matanj83 ***@***.***>
Sent: Monday, January 13, 2025 3:40:19 AM
To: dusty-nv/jetson-containers ***@***.***>
Cc: Dustin Franklin ***@***.***>; Comment ***@***.***>
Subject: Re: [dusty-nv/jetson-containers] [L4T 36.4.2]jetson orin nano super failed to start using jetson-containers run $(autotag nanoowl)? (Issue #772)
Hi Dusty, thanks again for your awesome work and efforts.
Not sure it's the place, but are there new Docker images in the pipe for the Nano Super?
Specifically, I currently use dustynv/transformers:r35.3.1 and dustynv/jetson-inference:r35.3.1 in my Dockerfiles
and would love to align to L4T 36.4.2 so that I could test the Nano Super.
BR,
Matan
—
Reply to this email directly, view it on GitHub<#772 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADVEGK5I45DW3PODMS5P2QT2KN3XHAVCNFSM6AAAAABU3UGWPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBWGQ4TQOBZGA>.
You are receiving this because you commented.Message ID: ***@***.***>
|
The problem appears to have something to do with symlink functionality that was added nvidia-container-toolkit last October. See my comment in the nvidia-container-toolkit repo. It looks like the problem has been "fixed" a couple of times, but only dealt with whatever case had triggered an issue report, without handling the more widespread problem. There is a simple workaround: just roll back all the nvidia-container-toolkit stuff to 1.16.2. I did that a while ago, but would love to be able to go back to doing normal updates on those packages. |
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running createContainer hook #1: exit status 1, stdout: , stderr: time="2025-01-09T16:43:05+08:00" level=info msg="Symlinking /var/lib/docker/overlay2/c45eac9aa3fcccb0c76842edb412a3dd8f0c7d4b8cad84554de0923ce8520118/merged/etc/vulkan/icd.d/nvidia_icd.json to /usr/lib/aarch64-linux-gnu/nvidia/nvidia_icd.json"
time="2025-01-09T16:43:05+08:00" level=error msg="failed to create link [/usr/lib/aarch64-linux-gnu/nvidia/nvidia_icd.json /etc/vulkan/icd.d/nvidia_icd.json]: failed to create symlink: failed to remove existing file: remove /var/lib/docker/overlay2/c45eac9aa3fcccb0c76842edb412a3dd8f0c7d4b8cad84554de0923ce8520118/merged/etc/vulkan/icd.d/nvidia_icd.json: device or resource busy": unknown.
The text was updated successfully, but these errors were encountered: