Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[L4T 36.4.2]jetson orin nano super failed to start using jetson-containers run $(autotag nanoowl)? #772

Open
EESN-W opened this issue Jan 9, 2025 · 6 comments

Comments

@EESN-W
Copy link

EESN-W commented Jan 9, 2025

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running createContainer hook #1: exit status 1, stdout: , stderr: time="2025-01-09T16:43:05+08:00" level=info msg="Symlinking /var/lib/docker/overlay2/c45eac9aa3fcccb0c76842edb412a3dd8f0c7d4b8cad84554de0923ce8520118/merged/etc/vulkan/icd.d/nvidia_icd.json to /usr/lib/aarch64-linux-gnu/nvidia/nvidia_icd.json"
time="2025-01-09T16:43:05+08:00" level=error msg="failed to create link [/usr/lib/aarch64-linux-gnu/nvidia/nvidia_icd.json /etc/vulkan/icd.d/nvidia_icd.json]: failed to create symlink: failed to remove existing file: remove /var/lib/docker/overlay2/c45eac9aa3fcccb0c76842edb412a3dd8f0c7d4b8cad84554de0923ce8520118/merged/etc/vulkan/icd.d/nvidia_icd.json: device or resource busy": unknown.
image

@dusty-nv
Copy link
Owner

Hi @EESN-W , check if you can other GPU containers, or is it specific to dustynv/nanoowl:r36.4.0 ?

if you keep getting that error elsewhere, it may be unrelated to the container itself, and your docker daemon may need reinstalled. It seems to be missing some of the mounted driver files.

@rgobbel
Copy link

rgobbel commented Jan 12, 2025

I'm seeing the same problem with nvidia-container-toolkit 1.17.3, AGX Orin 64GB with everything upgraded to the latest Jetpack 6.1 or better:

Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running createContainer hook #1: exit status 1, stdout: , stderr: time="2025-01-12T09:39:37-08:00" level=info msg="Symlinking /var/lib/docker/overlay2/cab97a8f0f4d243c89ba84ac2925460b9207c2419dbf8faab2015a5ed32d2a08/merged/etc/vulkan/icd.d/nvidia_icd.json to /usr/lib/aarch64-linux-gnu/nvidia/nvidia_icd.json"
time="2025-01-12T09:39:37-08:00" level=error msg="failed to create link [/usr/lib/aarch64-linux-gnu/nvidia/nvidia_icd.json /etc/vulkan/icd.d/nvidia_icd.json]: failed to create symlink: failed to remove existing file: remove /var/lib/docker/overlay2/cab97a8f0f4d243c89ba84ac2925460b9207c2419dbf8faab2015a5ed32d2a08/merged/etc/vulkan/icd.d/nvidia_icd.json: device or resource busy": unknown
Error: failed to start containers: cosmos_container2

No problems at all if I downgrade to 1.16.2.

@dusty-nv
Copy link
Owner

dusty-nv commented Jan 12, 2025 via email

@matanj83
Copy link

Hi Dusty, thanks again for your awesome work and efforts.

Not sure it's the place, but are there new Docker images in the pipe for the Nano Super?
Specifically, I currently use dustynv/transformers:r35.3.1 and dustynv/jetson-inference:r35.3.1 in my Dockerfiles
and would love to align to L4T 36.4.2 so that I could test the Nano Super.

BR,
Matan

@dusty-nv
Copy link
Owner

dusty-nv commented Jan 13, 2025 via email

@rgobbel
Copy link

rgobbel commented Jan 13, 2025

Ok, i take it that behavior is not specific to NanoOWL container? If you upgrade docker daemon version or nvidia-container-runtime version, it often messes with the driver mounts (these are listed under /etc/nvidia-container-runtime/host-mounts-for-container.d) For this reason , it is recommended either to pin or not upgrade those packages , or be prepared to reinstall/update the other necessary components.

________________________________ From: Randy Gobbel @.> Sent: Sunday, January 12, 2025 12:49:32 PM To: dusty-nv/jetson-containers @.> Cc: Dustin Franklin @.>; Comment @.> Subject: Re: [dusty-nv/jetson-containers] [L4T 36.4.2]jetson orin nano super failed to start using jetson-containers run $(autotag nanoowl)? (Issue #772) I'm seeing the same problem with nvidia-container-toolkit 1.17.3, AGX Orin 64GB with everything upgraded to the latest Jetpack 6.1 or better: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running createContainer hook #1: exit status 1, stdout: , stderr: time="2025-01-12T09:39:37-08:00" level=info msg="Symlinking /var/lib/docker/overlay2/cab97a8f0f4d243c89ba84ac2925460b9207c2419dbf8faab2015a5ed32d2a08/merged/etc/vulkan/icd.d/nvidia_icd.json to /usr/lib/aarch64-linux-gnu/nvidia/nvidia_icd.json" time="2025-01-12T09:39:37-08:00" level=error msg="failed to create link [/usr/lib/aarch64-linux-gnu/nvidia/nvidia_icd.json /etc/vulkan/icd.d/nvidia_icd.json]: failed to create symlink: failed to remove existing file: remove /var/lib/docker/overlay2/cab97a8f0f4d243c89ba84ac2925460b9207c2419dbf8faab2015a5ed32d2a08/merged/etc/vulkan/icd.d/nvidia_icd.json: device or resource busy": unknown Error: failed to start containers: cosmos_container2 No problems at all if I downgrade to 1.16.2. — Reply to this email directly, view it on GitHub<#772 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADVEGK6ZRLNHY5NYP6VV6LT2KKTKZAVCNFSM6AAAAABU3UGWPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBVHA2DQOBVGQ. You are receiving this because you commented.Message ID: @.***>

The problem appears to have something to do with symlink functionality that was added nvidia-container-toolkit last October. See my comment in the nvidia-container-toolkit repo. It looks like the problem has been "fixed" a couple of times, but only dealt with whatever case had triggered an issue report, without handling the more widespread problem.

There is a simple workaround: just roll back all the nvidia-container-toolkit stuff to 1.16.2. I did that a while ago, but would love to be able to go back to doing normal updates on those packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants