🐛 [Bug] libtorchtrt.so: undefined symbol when importing torch_tensorrt in docker #3350

NetaPanda · 2025-01-10T06:08:18Z

Bug Description

I have tried installing the repo with docker by:

sudo DOCKER_BUILDKIT=1 docker build --build-arg TENSORRT_VERSION=10.7.0 -f docker/Dockerfile -t torch_tensorrt:latest .

At my first attempt, the docker building process shows a warning:
INFO: pip is looking at multiple versions of torch

then it downloads many torch versions without installing them and the process stucks forever.

After some investigation I added the RUN pip install --upgrade pip into Dockerfile, right under the line of

RUN curl -L https://github.com/a8m/envsubst/releases/download/v1.2.0/envsubst-`uname -s`-`uname -m` -o envsubst &&\
    chmod +x envsubst && mv envsubst /usr/local/bin

AND right above the line of

RUN pip install -r /opt/torch_tensorrt/py/requirements.txt
blabla...

Now the build process can finish, but once I go inside the container with

sudo docker run --rm --runtime=nvidia --gpus all -it --shm-size=8gb --env="DISPLAY" --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" --name=torch_tensorrt --ipc=host --net=host torch_tensorrt:latest

and try to import torch_tensorrt inside python, it gives the following error:

OSError: /root/.pyenv/versions/3.10.16/lib/python3.10/site-packages/torch_tensorrt/lib/libtorchtrt.so: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKSs

I have also tried to build from source, which ended up with exactly the same error as in docker (undefined symbol).

I wonder are there any issues with my OS? I am using Ubuntu 22.04, with cuda 12.6 installed.

To Reproduce

Steps to reproduce the behavior:

git clone the repo by git clone https://github.com/pytorch/TensorRT.git
Modify the docker/Dockerfile, add RUN pip install --upgrade pip into Dockerfile as described above
sudo DOCKER_BUILDKIT=1 docker build --build-arg TENSORRT_VERSION=10.7.0 -f docker/Dockerfile -t torch_tensorrt:latest .
sudo docker run --rm --runtime=nvidia --gpus all -it --shm-size=8gb --env="DISPLAY" --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" --name=torch_tensorrt --ipc=host --net=host torch_tensorrt:latest
python
import torch_tensorrt

Expected behavior

The torch_tensorrt should be imported succesfully.

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

Torch-TensorRT Version (e.g. 1.0.0): 2.6.0a0 (since I directly cloned the main branch)
PyTorch Version (e.g. 1.0): N/A (installed within docker)
CPU Architecture: X86-64 (Intel I9-13900K)
OS (e.g., Linux): Ubuntu 22.04 Desktop
How you installed PyTorch (conda, pip, libtorch, source): Managed by Dockerfile
Build command you used (if compiling from source): See the above Steps to reproduce
Are you using local sources or building from archives: N/A
Python version: 3.10
CUDA version: local 12.6, inside docker it seems to be 12.4
GPU models and configuration: RTX4090
Any other relevant information: N/A

Additional context

The INFO: pip is looking at multiple versions of torch issue was not exist for the very first few attempts of my docker build, I wonder if this is a cache conflict or caused by other issues.

The text was updated successfully, but these errors were encountered:

zewenli98 · 2025-01-15T04:48:21Z

To my knowledge, you would get the undefined symbol error in two cases: 1) mismatched torch version and libtorch where you can find in MODULE.bazel 2) didn't use --use-cxx11-abi. For CUDA 12.6, if you want to build torch-trt from source, you need to run something like python setup.py develop --use-cxx11-abi

NetaPanda · 2025-01-15T09:04:41Z

Thanks for the tip! After a painful try of 3-4 days, I finally gave up and used the nvidia pytorch docker instead. It was a mess for me to sort out the library, especially when multiple torch packages were installed (both in conda and local python env) in my system. Indeed I did not use --use-cxx11-abi, not sure if that was the real reason behind the issue. Since I have found another solution, I shall close this issue.

NetaPanda added the bug Something isn't working label Jan 10, 2025

NetaPanda closed this as completed Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 [Bug] libtorchtrt.so: undefined symbol when importing torch_tensorrt in docker #3350

🐛 [Bug] libtorchtrt.so: undefined symbol when importing torch_tensorrt in docker #3350

NetaPanda commented Jan 10, 2025

zewenli98 commented Jan 15, 2025

NetaPanda commented Jan 15, 2025

🐛 [Bug] libtorchtrt.so: undefined symbol when importing torch_tensorrt in docker #3350

🐛 [Bug] libtorchtrt.so: undefined symbol when importing torch_tensorrt in docker #3350

Comments

NetaPanda commented Jan 10, 2025

Bug Description

To Reproduce

Expected behavior

Environment

Additional context

zewenli98 commented Jan 15, 2025

NetaPanda commented Jan 15, 2025