Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Colab tutorial failing due to CUDA version conflict #1191

Closed
complexitysoftware opened this issue Mar 29, 2024 · 5 comments
Closed

[Bug]: Colab tutorial failing due to CUDA version conflict #1191

complexitysoftware opened this issue Mar 29, 2024 · 5 comments

Comments

@complexitysoftware
Copy link

Bug Description

Tutorial fails with
ImportError: libnvrtc.so.11.2: cannot open shared object file: No such file or directory

It seems the error is a conflict on the CUDA version installed on Colab which seems to be 12.2
Find on libnvrtc gives:
/usr/local/cuda-12.2/targets/x86_64-linux/lib/stubs/libnvrtc.so
/usr/local/cuda-12.2/targets/x86_64-linux/lib/libnvrtc.so
/usr/local/cuda-12.2/targets/x86_64-linux/lib/libnvrtc.so.12.2.140
/usr/local/cuda-12.2/targets/x86_64-linux/lib/libnvrtc.so.12

To Reproduce

  1. Clicking the 'Try on Colab' button on the Flame GPU home page opens the Colab tutorial for potential users (https://colab.research.google.com/github/FLAMEGPU/FLAMEGPU2-tutorial-python/blob/google-colab/FLAME_GPU_2_python_tutorial.ipynb).

  2. Running the tutorial fails with
    ImportError: libnvrtc.so.11.2: cannot open shared object file: No such file or directory
    on the import pyflamegpu line

Expected Behaviour

Tutorial should run

OS

Ubuntu 22.04.3 LTS

CUDA Versions

CUDA 12.2

GPUs

T4

GPU Driver

535.104.05

Additional Information

No response

@Robadob
Copy link
Member

Robadob commented Mar 29, 2024

Hi,

Thanks for reporting this.

I'm able to reproduce it, and it's not as trivial as past errors to resolve so a bit outside of my expertise. You'll need to wait for my colleague @ptheywood to get to it, hopefully next week (it's a four day weekend for Easter here currently).


my notes

updated the wheelhouse link (needed latest RC too)

https://whl.flamegpu.com/whl/cuda120/ pyflamegpu==2.0.0rc1

Package import then fails with

ImportError: libnvrtc-builtins.so.12.0: cannot open shared object file: No such file or directory

Checking libnvrtc-builtins

!find / -name 'libnvrtc-builtins*

Reports

/usr/local/cuda-12.2/targets/x86_64-linux/lib/libnvrtc-builtins_static.a
/usr/local/cuda-12.2/targets/x86_64-linux/lib/libnvrtc-builtins.so
/usr/local/cuda-12.2/targets/x86_64-linux/lib/libnvrtc-builtins.so.12.2
/usr/local/cuda-12.2/targets/x86_64-linux/lib/libnvrtc-builtins.so.12.2.140

This would imply we either need to symlink libnvrtc-builtins.so.12.0 -> libnvrtc-builtins.so or need a CUDA 12.2 specific wheel.
(My understanding is that post CUDA 11.2, they standardised on lib versions being 11.2, so would assume its the same for CUDA 12.x standardising on 12.0.)

@ptheywood
Copy link
Member

ptheywood commented Apr 2, 2024

I've looked into this a bit but no resolution yet, notes for future reference below


This does disagree with my understanding of the nvrtc shared libary versioning scheme from CUDA 11.3+

https://docs.nvidia.com/cuda/nvrtc/#versioning-scheme

https://developer.nvidia.com/blog/programming-efficiently-with-the-cuda-11-3-compiler-toolchain/

which suggests that just depending on an libnvrtc.so.11.2/libnvrtc.so.12.0 is all that should be required. It could be that due to our combination of linking steps/flags it is explicitly depending on the subpackage libnvrtc-builtins.so.X.Y.Z, so perhaps allowing that to be an implicit dependency would be one way to resolve this.


We could switch to static linking libnvrtc.so, which would increase the size of our wheels and force recompilation for any bugfixes etc, but woudl resolve the issue, however this would only be viable for CUDA 11.5+ with CMake >= 3.26 via cmake. libnvrtc_static.a is also ~70MB.

https://cmake.org/cmake/help/latest/module/FindCUDAToolkit.html#cuda-toolkit-nvrtc


Nvidia do distribute nvrtc's runtime dependencies as a pip package per major cuda version (nvidia-cuda-nvrtc-cu11, nvidia-cuda-nvrtc-cu12), which we could add a runtime dependency on, or optionally install the corresponding version, e.g.

    !{sys.executable} -m pip install --extra-index-url https://whl.flamegpu.com/whl/cuda120/ pyflamegpu==2.0.0rc1 nvidia-cuda-nvrtc-cu12==12.0.140  # type: ignore

However on colab, this conflicts with the version of that package which pytorch depends on, although it does install it and create the appropriate .so files do get created, but then do not link as they are not visible to the linker even after installation.

Looking in indexes: https://pypi.org/simple, https://whl.flamegpu.com/whl/cuda120/
Collecting pyflamegpu==2.0.0rc1
  Downloading https://github.com/FLAMEGPU/FLAMEGPU2/releases/download/v2.0.0-rc.1/pyflamegpu-2.0.0rc1%2Bcuda120-cp310-cp310-linux_x86_64.whl (172.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 172.5/172.5 MB 3.0 MB/s eta 0:00:00
Collecting nvidia-cuda-nvrtc-cu12==12.0.140
  Downloading nvidia_cuda_nvrtc_cu12-12.0.140-py3-none-manylinux1_x86_64.whl (23.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.4/23.4 MB 6.4 MB/s eta 0:00:00
Installing collected packages: pyflamegpu, nvidia-cuda-nvrtc-cu12
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torch 2.2.1+cu121 requires nvidia-cublas-cu12==12.1.3.1; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
torch 2.2.1+cu121 requires nvidia-cuda-cupti-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
torch 2.2.1+cu121 requires nvidia-cuda-runtime-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
torch 2.2.1+cu121 requires nvidia-cudnn-cu12==8.9.2.26; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
torch 2.2.1+cu121 requires nvidia-cufft-cu12==11.0.2.54; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
torch 2.2.1+cu121 requires nvidia-curand-cu12==10.3.2.106; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
torch 2.2.1+cu121 requires nvidia-cusolver-cu12==11.4.5.107; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
torch 2.2.1+cu121 requires nvidia-cusparse-cu12==12.1.0.106; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
torch 2.2.1+cu121 requires nvidia-nccl-cu12==2.19.3; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
torch 2.2.1+cu121 requires nvidia-nvtx-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
torch 2.2.1+cu121 requires nvidia-cuda-nvrtc-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cuda-nvrtc-cu12 12.0.140 which is incompatible.
Successfully installed nvidia-cuda-nvrtc-cu12-12.0.140 pyflamegpu-2.0.0rc1+cuda120
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-4-55aa0e8c823e> in <cell line: 7>()
      5 
      6 # Import pyflamegpu and some other libraries we will use in the tutorial
----> 7 import pyflamegpu
      8 import sys, random, math
      9 import matplotlib.pyplot as plt

1 frames
/usr/local/lib/python3.10/dist-packages/pyflamegpu/pyflamegpu.py in <module>
      8 # Import the low-level C/C++ module
      9 if __package__ or "." in __name__:
---> 10     from . import _pyflamegpu
     11 else:
     12     import _pyflamegpu

ImportError: libnvrtc-builtins.so.12.0: cannot open shared object file: No such file or directory

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------

Attempting to add that location via !export LD_LIBRARY_PATH and os.environ["LD_LIBRARY_PATH"]=... both failed on colab. As LD_DEBUG outputs to stderr, colab's old ipykernal package prevents that from being useful on colab.

Attempting this locally after uninstalling cuda 12.0 (so that the binary's RPATH expected location does not exist) allows the error to be reproduce, and investigated via LD_DEBUG=libs.

In which case installing python3 -m pip install nvidia-cuda-nvrtc-cu12==12.0.140 and explicitly setting the LD_LIBRARY_PATH does work.

LD_LIBRARY_PATH="/path/to/venv/lib/python3.10/site-packages/nvidia/cuda_nvrtc/lib:${LD_LIBRARY_PATH}" python3 -c "import pyflamegpu; print(pyflamegpu.__version__)"

2.0.0rc1+cuda120

so we could potentially do something in our __init__.py to ensure that path would be on the ld library path, but it will still mean that pyflamegpu built with CUDA 12.0 would be incompatibly with say pytorch built with 12.1 in the same environment, which is not really surprising.

I would hope its possible to find a way to make this work on collab via LD_LIBRARY_PATH but my initial attempts were unsuccesseful. Potentialyl something to do with the environment at python launch time vs what ld see's at runtime?

Torch adds the appropraite version of nvidia-cuda-nvrtc-cuXX to extra_install_requires in their setup.py for wheels they are going to upload to pypi.

So that local wheels don't need it (to avoid pulling in 80MB per pyflamegpu build locally) but distribtued wheels do depend upon it.

We could add a CMake option to enable this behaviour only in our release CI workflows (e.g. a less wodry version of -DFLAMEGPU_PYTHON_CUDA_EXTRA_INSTALL_REQUIRES=ON)

However, this will still cause conflicts if a user wants pyflamegpu and other python packages built with different cuda's in the same env, though in that case the solution is a local build.

It's still not entirely clear to me how we then ensure that the correct library is found at pyflamegpu import time via python.


Alternatively we could distribute libnvrtc.so and libnvrtc-builtins.so with our wheels like we do some vis dependencies, but that will bloat our wheels by ~60MB each (plus more if we do our other cuda deps), and unless we also start dlopening libcuda.so we won't achieve many linux compliance either.

Torch adds the appropraite version of nvidia-cuda-nvrtc-cuXX to extra_install_requires in their setup.py for wheels they are going to upload to pypi.

So that local wheels don't need it (to avoid pulling in 80MB per pyflamegpu build locally) but distribtued wheels do depend upon it.

We could add a CMake option to enable this behaviour only in our release CI workflows (e.g. a less wodry version of -DFLAMEGPU_PYTHON_CUDA_EXTRA_INSTALL_REQUIRES=ON)

However, this will still cause conflicts if a user wants pyflamegpu and other python packages built with different cuda's in the same env, though in that case the solution is a local build.

It's still not entirely clear to me how we then ensure that the correct library is found at pyflamegpu import time via python.

@ptheywood
Copy link
Member

ptheywood commented Apr 2, 2024

Short term fix is to replace the contents of the second cell with:

import importlib.util
if importlib.util.find_spec('pyflamegpu') is None:
    import sys
    !{sys.executable} -m pip install --extra-index-url https://whl.flamegpu.com/whl/cuda120/ pyflamegpu==2.0.0rc1 nvidia-cuda-nvrtc-cu12==12.0.140  # type: ignore

import ctypes
ctypes.CDLL("/usr/local/lib/python3.10/dist-packages/nvidia/cuda_nvrtc/lib/libnvrtc-builtins.so.12.0")

# Import pyflamegpu and some other libraries we will use in the tutorial
import pyflamegpu
import sys, random, math
import matplotlib.pyplot as plt

This appears to work but results in pip errors/warnings about package conflicts, and the hardcoded path to the dll is not ideal.

Longer term we can probably fold some similar logic into init.py and add the extra_install_requires for distributed wheels.


However as colab ships with CUDA 12.2 now, this encounters the very bad RTC compilation runtime from 12.2+ due to the use of jitify, so the first run of the run_simulation cell took 5 minutes, compared to 5s for re-runs (16 agent functions).

#1118

@ptheywood
Copy link
Member

@complexitysoftware I've now pushed an update to the google-colab branch on our tutorial repository which should have addressed this on colab in the short term. If you retry the online tutorial again (https://flamegpu.com/try) it should now work (confirmed by @Robadob).
RTC performance with CUDA 12.2+ is also poor (#1118) so the first time the cell which actually runs the simulations is ran it will take ~3-5 minutes, rather than a number of seconds. Re running the cell will take ~5s if agent functions are not modified.
This will be fixed once #1150 is complete, but we do not have an ETA on that either.

We will need to make changes to FLAMEGPU/FLAMEGPU2 itself for a more robust / correct fix (and update the tutorial again), but can't commit to a timeline for that.

Thank you for making us aware of the issue, and I've opened #1193 to track us fixing this in a more robust way for future releases.

@complexitysoftware
Copy link
Author

Yes I have tried the amended Colab notebook and it does run correctly. The compile is slow but it does work and the warning sets users' expectations. Thanks for your work on this and for developing and maintaining FlameGPU. I have worked in IT in many areas for many years and emergent behaviour is still the most fascinating for me. Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants