Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate improving the error message for PTX_COMPILE error #313

Open
ksimpson-work opened this issue Dec 18, 2024 · 2 comments · May be fixed by #394
Open

Investigate improving the error message for PTX_COMPILE error #313

ksimpson-work opened this issue Dec 18, 2024 · 2 comments · May be fixed by #394
Assignees
Labels
cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P1 Medium priority - Should do

Comments

@ksimpson-work
Copy link
Contributor

If you don't pass an architecture argument to program options, compile will use the architecture of the current device, however it is possible that the arch of the current device is not able to handle the ptx compilation. On Program compile failure we could potentially provide some tips for things to check depending on the error code. I see this as an enhancement personally.

@ksimpson-work ksimpson-work added cuda.core Everything related to the cuda.core module enhancement Any code-related improvements triage Needs the team's attention labels Dec 18, 2024
@leofang
Copy link
Member

leofang commented Dec 30, 2024

If you don't pass an architecture argument to program options, compile will use the architecture of the current device,

For NVRTC, the default is compute_52 (it's documented here). Using the current device arch was my suggestion during a past offline chat 🙂 (and I still think we should do it)

however it is possible that the arch of the current device is not able to handle the ptx compilation.

This is a more complex topic. Considering two steps

  • Generating PTX:
    • This needs NVRTC to support the target device arch (either user-specified or current). In the past this translated to comparing the CTK/NVRTC version with the earliest CTK version that supports the device arch (all libraries need to maintain this mapping somewhere, ex: cc90 -> CTK 11.8). However, recent NVRTC has the nvrtcGetSupportedArchs API to avoid this need. We could call this API in Program.
  • Generating SASS:
    • All drivers can JIT compile PTX to SASS. However, this could fail when the driver version is older than the toolchain (NVRTC) version that is used to generate the PTX. We've implemented this check in the test suite, and we should move it to as part of Program.

@ksimpson-work
Copy link
Contributor Author

I raised this issue before we added the _exception_manager ContextManager to the linker class, and the error message is now much better. I should have closed it then, but I agree with calling nvrtcGetSupportedArchs in Program and moving the check as well, and that would constitute a more robust fix.

separately, I'm running into the default compute think as I work on the ProgramOptions which use the current device arch, while Linker currently requires it as an argument. I am going to make it also fallback to the dcurrent device and put that up for review.

@leofang leofang added P1 Medium priority - Should do and removed triage Needs the team's attention labels Jan 1, 2025
@leofang leofang added this to the cuda.core beta 3 milestone Jan 1, 2025
@ksimpson-work ksimpson-work linked a pull request Jan 14, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P1 Medium priority - Should do
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants