-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA]: Introduce Python module with CCCL headers #3201
base: main
Are you sure you want to change the base?
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
python/cuda_cccl/setup.py
Outdated
project_path = os.path.abspath(os.path.dirname(__file__)) | ||
cccl_path = os.path.abspath(os.path.join(project_path, "..", "..")) | ||
cccl_headers = [["cub", "cub"], ["libcudacxx", "include"], ["thrust", "thrust"]] | ||
ver = "0.1.2.8.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to use the CCCL version here, not CCCL Python modules' version. We should also not hard-code it, but instead read from CMakeLists which is the source of truth AFAIK, and for that setuptools might not be doing the job. @vyasr might have a simple example for how this can be done with scikit-build-core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. I added this is a bullet to the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out the dynamic metadata section, specifically the Regex tab.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would need to rewrite everything here to use CMake instead of setuptools. Depending on what this module is trying to do that may or may not be beneficial. Do you need to run compilation of cuda_cccl/cooperative/parallel against CCCL headers? In that case it is almost certainly worthwhile, I wouldn't want to orchestrate that compilation using setuptools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to run compilation of cuda_cccl/cooperative/parallel against CCCL headers?
cuda_cccl
would just be nvidia-cuda-cccl-cuXX containing the headers but owned/maintained by the CCCL team for faster release cycles (think of it ascccl
vscuda-cccl
on conda-forge)cuda_cooperative
JIT compiles CCCL headers at run time, so for all purposes the headers can be thought as shared libraries; no AOT compilation is neededcuda_parallel
is the most interesting case, because it does need to build the CCCL C shared library and include it in the wheel, but I dunno if building it requires NVCC + CCCL headers, or GCC/MSVC alone is enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I dunno if building it requires NVCC + CCCL headers, or GCC/MSVC alone is enough
Based on
- adding
-DCMAKE_VERBOSE_MAKEFILE=ON
and looking at the output of pip install --verbose ./cuda_parallel[test]
nvcc
is required for compiling cccl/c/parallel/src/for.cu
and reduce.cu
:
cd /home/coder/cccl/python/cuda_parallel/build/temp.linux-x86_64-cpython-312/c/parallel && /usr/bin/sccache /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/usr/bin/g++ -DCCCL_C_EXPERIMENTAL=1 -DNVRTC_GET_TYPE_NAME=1 -D_CCCL_NO_SYSTEM_HEADER -Dcccl_c_parallel_EXPORTS --options-file CMakeFiles/cccl.c.parallel.dir/includes_CUDA.rsp -O3 -DNDEBUG -std=c++20 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" -Xcompiler=-fPIC -Xcudafe=--display_error_number -Wno-deprecated-gpu-targets -Xcudafe=--promote_warnings -Wreorder -Xcompiler=-Werror -Xcompiler=-Wall -Xcompiler=-Wextra -Xcompiler=-Wreorder -Xcompiler=-Winit-self -Xcompiler=-Woverloaded-virtual -Xcompiler=-Wcast-qual -Xcompiler=-Wpointer-arith -Xcompiler=-Wvla -Xcompiler=-Wno-gnu-line-marker -Xcompiler=-Wno-gnu-zero-variadic-macro-arguments -Xcompiler=-Wno-unused-function -Xcompiler=-Wno-noexcept-type -MD -MT c/parallel/CMakeFiles/cccl.c.parallel.dir/src/for.cu.o -MF CMakeFiles/cccl.c.parallel.dir/src/for.cu.o.d -x cu -c /home/coder/cccl/c/parallel/src/for.cu -o CMakeFiles/cccl.c.parallel.dir/src/for.cu.o
cd /home/coder/cccl/python/cuda_parallel/build/temp.linux-x86_64-cpython-312/c/parallel && /usr/bin/sccache /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/usr/bin/g++ -DCCCL_C_EXPERIMENTAL=1 -DNVRTC_GET_TYPE_NAME=1 -D_CCCL_NO_SYSTEM_HEADER -Dcccl_c_parallel_EXPORTS --options-file CMakeFiles/cccl.c.parallel.dir/includes_CUDA.rsp -O3 -DNDEBUG -std=c++20 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" -Xcompiler=-fPIC -Xcudafe=--display_error_number -Wno-deprecated-gpu-targets -Xcudafe=--promote_warnings -Wreorder -Xcompiler=-Werror -Xcompiler=-Wall -Xcompiler=-Wextra -Xcompiler=-Wreorder -Xcompiler=-Winit-self -Xcompiler=-Woverloaded-virtual -Xcompiler=-Wcast-qual -Xcompiler=-Wpointer-arith -Xcompiler=-Wvla -Xcompiler=-Wno-gnu-line-marker -Xcompiler=-Wno-gnu-zero-variadic-macro-arguments -Xcompiler=-Wno-unused-function -Xcompiler=-Wno-noexcept-type -MD -MT c/parallel/CMakeFiles/cccl.c.parallel.dir/src/reduce.cu.o -MF CMakeFiles/cccl.c.parallel.dir/src/reduce.cu.o.d -x cu -c /home/coder/cccl/c/parallel/src/reduce.cu -o CMakeFiles/cccl.c.parallel.dir/src/reduce.cu.o
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skimmed over the code and I am actually confused, because my impression is that the kernel compilation is still done at run time (JIT), and that the host logic can just be handled by a host compiler. @gevtushenko IIRC you built the prototype, any reason we have to use .cu
files here and use NVCC to compile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit 2913ae0 adopts the established _version.py handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tl;dr I would suggest that if you have to do any compilation whatsoever beyond pure Cython you switch away from setuptools, but if you don't have any compiled modules at build time then stick to setuptools or use another backend that isn't designed for compilation (hatchling would be a great choice).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gevtushenko IIRC you built the prototype, any reason we have to use
.cu
files here and use NVCC to compile?
In the offline call Georgii reminded me that there are some CUB structs that we need to pre-compile to pass around. Since generally CUB headers are not host compilable, NVCC has to be used, but we don't generate any GPU-specific code.
Q: In what way is it not working? |
It is getting a non-existing path here:
At HEAD, cuda_paralleld/cuda/_include exists in the source directory (it is |
On August 30, 2014 @leofang wrote: Leo: Do you still recommend that we replace I'm asking because that'll take this PR in a very different direction (I think). |
Logging an observation (JIC it's useful to reference this later): With CCCL HEAD (I have @ d6253b5) TL;DR: @gevtushenko could this explain your "only works 50% of the time" experience? Current working directory is
The output is:
Similarly for cuda_parallel:
Same output as above. |
Now with this PR (@ daab580) TL;DR: Same problem (this had me really confused TBH).
Output:
|
Small summary:
|
Commit ef9d5f4 makes the I wouldn't be surprised if this isn't the right way of doing it, but it does work in one pass. |
… cuda._include to find the include path.
Commit bc116dc fixes the |
… (they are equivalent to the default functions)
It turns out what I discovered the hard way was actually a known issue: Lines 23 to 27 in d6253b5
|
/ok to test |
🟩 CI finished in 58m 34s: Pass: 100%/176 | Total: 1d 00h | Avg: 8m 22s | Max: 44m 12s | Hits: 99%/22510
|
Project | |
---|---|
+/- | CCCL Infrastructure |
libcu++ | |
CUB | |
Thrust | |
CUDA Experimental | |
+/- | python |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 176)
# | Runner |
---|---|
125 | linux-amd64-cpu16 |
25 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1-testing |
🟩 CI finished in 50m 37s: Pass: 100%/176 | Total: 23h 49m | Avg: 8m 07s | Max: 44m 41s | Hits: 99%/22530
|
Project | |
---|---|
+/- | CCCL Infrastructure |
libcu++ | |
CUB | |
Thrust | |
CUDA Experimental | |
+/- | python |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 176)
# | Runner |
---|---|
125 | linux-amd64-cpu16 |
25 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1-testing |
…uild_result' ``` =========================================================================== warnings summary =========================================================================== tests/test_reduce.py::test_reduce_non_contiguous /home/coder/cccl/python/devenv/lib/python3.12/site-packages/_pytest/unraisableexception.py:85: PytestUnraisableExceptionWarning: Exception ignored in: <function _Reduce.__del__ at 0x7bf123139080> Traceback (most recent call last): File "/home/coder/cccl/python/cuda_parallel/cuda/parallel/experimental/algorithms/reduce.py", line 132, in __del__ bindings.cccl_device_reduce_cleanup(ctypes.byref(self.build_result)) ^^^^^^^^^^^^^^^^^ AttributeError: '_Reduce' object has no attribute 'build_result' warnings.warn(pytest.PytestUnraisableExceptionWarning(msg)) -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ============================================================= 1 passed, 93 deselected, 1 warning in 0.44s ============================================================== ```
🟩 CI finished in 1h 18m: Pass: 100%/176 | Total: 23h 54m | Avg: 8m 09s | Max: 40m 55s | Hits: 98%/22564
|
Project | |
---|---|
+/- | CCCL Infrastructure |
libcu++ | |
CUB | |
Thrust | |
CUDA Experimental | |
+/- | python |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 176)
# | Runner |
---|---|
125 | linux-amd64-cpu16 |
25 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1-testing |
python/cuda_cccl/setup.py
Outdated
CCCL_PATH = PROJECT_PATH.parents[1] | ||
|
||
|
||
def copy_cccl_headers_to_cuda_cccl_include(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the best way to solve this problem. It probably works in many cases, but it also has a lot of sharp edges. The better solution for this kind of installation (the problem described in
Line 23 in d6253b5
# Temporarily install the package twice to populate include directory as part of the first installation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: 71fd243
python/cuda_cooperative/setup.py
Outdated
install_requires=[ | ||
f"cuda-cccl @ file://{CCCL_PYTHON_PATH}/cuda_cccl", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't what you want. This is going to hardcode the path into the installation requirements in a way that means that you'll pretty much never be able to ship a wheel because the wheel will try and install cuda-cccl from a specific path. I would remove this and update whatever build scripts you run in CI to do this with something like a pip constraint file.
If you do that, I would also get rid of setup.py altogether because everything in this file becomes static and could be moved to pyproject.toml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't what you want.
Well, is that really true? ... @leofang I've been wondering all the while about the granularity of
- cuda_cccl
- cuda_parallel
- cuda_cooperative
all originating from the same git repo.
Cons: Causes significant extra work, and possibly worse long term, version mismatch issues. With cuda_parallel, we're shipping site-packages/cuda/parallel/experimental/cccl/libcccl.c.parallel.so. What if the CCCL header files (in cuda-cccl) do not match exactly because they are installed separately? Do we have to worry about ODR issues?
What are the Pros of distributing 3 pip packages originating from the same git repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove this and update whatever build scripts you run in CI to do this with something like a pip constraint file.
That's almost great: commit 79057cf
But it only works with absolute pathnames. :-(
-cuda-cccl @ file:///home/coder/cccl/python/cuda_cccl
+cuda-cccl @ file://../cuda_cccl
ValueError: non-local file URIs are not supported on this platform: 'file://../cuda_cccl'
(When using a cccl Dev container.)
I would also get rid of setup.py
I still have this in setup.py:
setup(
license_files=["../../LICENSE"],
)
That's the only case I found, apparently escaping the setuptools checks for sources pulled from a parent directory. ChatGPT claims those checks are intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, is that really true?
IIUC there are two separate questions that you're really asking:
- Do we need to ensure that the versions of these three packages are "exactly" compatible when installed (with "exactly" potentially going beyond API compatibility to meaning all the shared files are in the same locations, ABI compatibility, etc)?
- Should the constraint in the package be specified this way.
(1) is a very good question. I don't know how tightly coupled these packages are, and you may indeed need to enforce some tight coupling. In RAPIDS I set up a versioning scheme for our nightlies where every single build of a package gets a unique alpha version so that we have such tight coupling. Maybe you need something similar.
(2) is not the way to handle (1), though. If you specify this kind of constraint in setup.py (or pyproject.toml) and then build a wheel, I assume the path will get built into the wheel. I wouldn't expect such a wheel to even be valid to upload (hopefully indexes would reject it), but if we assume for the moment that you could upload such a wheel, it would never be installable on the user's system because it has a hardcoded path embedded in a requirement that will presumably be unsatisfiable unless they manually download the dependency to that location. The only good way for a user to get around this will be to manually pip install cuda-cccl
themselves (which I believe will satisfy this constraint, but can't guarantee).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the CCCL header files (in cuda-cccl) do not match exactly because they are installed separately? Do we have to worry about ODR issues?
This should not happen because our Python version scheme
PY_MAJOR.PY_MINOR.CCCL_MAJOR.CCCL_MINOR.CCCL_PATCH
could be used at runtime to enforce (or relax) a version lock (use importlib.metadata
to get the package version).
Alternatively, cuda.coop/par could declare a dependency on cuda-cccl in pyproject.toml, then the version constraint is enforced at the pip level (which is preferred). You'd say "but cuda-cccl is not pip installable yet!" and we could work around both locally and in the CI by either
- doing
pip install --no-deps cuda-parallel
to only install cuda-parallel without also installingcuda-cccl
, or - manually installing from the bottom of the dependency graph (
pip install cuda-cccl
first, and then cuda.coop/par, splitting into 2pip install
steps)- this is also how we test cuda.core in NVIDIA/cuda-python CI (which tests against in-development cuda.bindings, exactly the same situation)
In any case I don't think version mismatch is something we need to worry about, before we're ready to push out packages.
What are the Pros of distributing 3 pip packages originating from the same git repo?
- They are co-developed
- They share the same set of headers
- They use the same CI infra (which is close to nothing currently, on the Python side, if I am being honest)
I'd be more than happy to chat about this, but I don't see why we need this discussion to address Vyas's original question in this thread? I would ask @vyasr
I would remove this and update whatever build scripts you run in CI to do this with something like a pip constraint file.
What are we trying to achieve here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are we trying to achieve here?
My assumption is that we want two things:
- When you build and distribute a cuda_cccl package, you want to specify that it should only be installed with compatible versions of cuda_parallel. That probably means that the release versions should match. That should be achieved with just a version constraint and should not require pinning to an exact file.
- When you are running CI tests of a specific commit on the repo (for example), you want to verify that you are testing builds of cuda_cccl and cuda_parallel from that exact commit. To do that, you build both in the same PR, then install in such a way as to ensure that you only get those two. To do that, you either explicitly install exactly those files with
--no-deps
and then their dependencies manually, or you do a normal pip install command but use something like a constraint file to indicate that this particular cuda_parallel build only works with the cuda_cccl wheel that was just built and lives in whatever directory.
python/cuda_parallel/setup.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the same comments from the previous two setup.py files apply here as well. More generally I would recommend rewriting this package to use scikit-build-core because you are using CMake. That may be out of scope for this PR though. If you wanted to stage the work into two separate PRs that would be reasonable. The main problem will be that the setuptools build won't be the most easily managed until then and will be hard to debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xref: #3201 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add some notes to the discussion Leo linked.
Technically, a custom copy routine is a bit nerve wrecking, especially after we encountered the incident (CUDAINST-3178) that for many months the nvidia-cuda-cccl-cuXX wheels were completely not usable.
This kind of problem is inevitable when using custom copy routines with setuptools as I mentioned above because it is fundamentally not what setup.py is designed for in the modern ecosystem. A lot of the problems come from bridging historical gaps. Whereas 15 years ago you could rely on setup.py being a Python script that was simply executed to install, now you have to think about the fact that all modern tooling involves going through a (possibly transient) wheel and you must inject commands at the right stage. If you use setuptools, that invariably means that you need to override their commands to get things right in all cases (wheels, sdists, from source, etc).
Culturally, our RAPIDS friends try to stay away from setuptools, and using scikit-build-core could help us ask for helps from RAPIDS easier because they're familiar with it
I would qualify this by saying that we stay away from setuptools for packages that are going to use CMake already. If you have a pure Python package (or Python + Cython) setuptools is perfectly fine. The problem comes from when you need to also invoke CMake. Every build system I've ever worked with that tried to do custom stitching together of setuptools and CMake had problems that were hard to solve and often even harder to track down. It's simply not worthwhile to open yourself up to bugs that you may not even know are there. FWIW, from what I've seen recently pyarrow has one of the better setups here.
Let's do minimal work to unblock ourselves so that we can focus on more important things.
I would agree with this too, with the caveat that you probably have more bugs that you're not aware of 🙂 if you can get things working "well enough" with setuptools then no need to switch right now. I just worry that you'll quickly accumulate various patches to keep that working and in six months you'll have an overly complex setup script to deal with as a result.
/ok to test |
🟨 CI finished in 1h 10m: Pass: 98%/170 | Total: 1d 03h | Avg: 9m 48s | Max: 58m 53s | Hits: 527%/15310
|
Project | |
---|---|
+/- | CCCL Infrastructure |
libcu++ | |
CUB | |
Thrust | |
CUDA Experimental | |
+/- | python |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 170)
# | Runner |
---|---|
122 | linux-amd64-cpu16 |
25 | linux-amd64-gpu-v100-latest-1 |
12 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1-testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw I suspect the use of ctypes here contributes to the constant overhead that Ashwin observed...
def __del__(self): | ||
if self.build_result is None: | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw I suppose your _mnff
could be applied to here too (not in this PR, ofc) 😉
/ok to test |
…has been archived by the owner on Jul 1, 2024) For completeness: The other repo took a long time to install into the pre-commit cache; so long it lead to timeouts in the CCCL CI.
🟨 CI finished in 2h 32m: Pass: 97%/148 | Total: 1d 03h | Avg: 11m 06s | Max: 59m 11s | Hits: 534%/25124
|
Project | |
---|---|
+/- | CCCL Infrastructure |
libcu++ | |
CUB | |
Thrust | |
CUDA Experimental | |
+/- | python |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 148)
# | Runner |
---|---|
98 | linux-amd64-cpu16 |
23 | linux-amd64-gpu-v100-latest-1 |
16 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1-testing |
🟩 CI finished in 3h 52m: Pass: 100%/148 | Total: 1d 03h | Avg: 11m 14s | Max: 59m 11s | Hits: 534%/25124
|
Project | |
---|---|
+/- | CCCL Infrastructure |
libcu++ | |
CUB | |
Thrust | |
CUDA Experimental | |
+/- | python |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 148)
# | Runner |
---|---|
98 | linux-amd64-cpu16 |
23 | linux-amd64-gpu-v100-latest-1 |
16 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1-testing |
Description
closes #2281
copy_cccl_headers_to_cuda_cccl_include()
before callingsetup()
, so thatpip install
works as expected in one pass. Resolves this.cuda.cccl.include_paths
fromcuda.cooperative.experimental._nvrtc
andcuda.parallel.experimental._bindings
.os.path
->pathlib
modernization in all .py files touched by this PR.Note for completeness:
I spent a significant amount of time trying to use
hatchling
as the build backend (instead ofsetuptools
):With that commit,
pip install
worked, butpip install --editable
did not. The root cause is this file installed by cuda-python:This file interferes with Python's Implicit Namespace Packages mechanism, which is what
hatchling
relies on in--editable
mode.