Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the options data class to program #237

Open
wants to merge 64 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
6d789cb
commit squash: add ProgramOptions to Program
ksimpson-work Nov 27, 2024
66ceb85
remove stream from commit
ksimpson-work Nov 27, 2024
37a945c
modify doc source
ksimpson-work Nov 27, 2024
3555e2e
modify doc source
ksimpson-work Nov 27, 2024
1fc1189
Merge branch 'main' into ksimpson/add_program_options
ksimpson-work Dec 18, 2024
a9ac448
integrate program options into the tests
ksimpson-work Dec 18, 2024
944bc1a
Merge branch 'main' into ksimpson/add_program_options
leofang Dec 18, 2024
d490af2
Merge branch 'main' into ksimpson/add_program_options
ksimpson-work Dec 19, 2024
0394877
Merge branch 'main' into ksimpson/add_program_options
ksimpson-work Dec 27, 2024
b6e3eac
Merge remote-tracking branch 'origin/main' into ksimpson/add_program_…
ksimpson-work Dec 30, 2024
6790c83
update the attribute names for consistency across linker and program
ksimpson-work Dec 30, 2024
a8f8e3a
Merge remote-tracking branch 'origin/main' into ksimpson/add_program_…
ksimpson-work Dec 30, 2024
401ab75
Merge remote-tracking branch 'origin/ksimpson/add_program_options' in…
ksimpson-work Dec 30, 2024
7d5b894
fix module test
ksimpson-work Dec 31, 2024
1789a84
update the tests
ksimpson-work Dec 31, 2024
2285fac
update the tests
ksimpson-work Jan 1, 2025
bb62048
move ProgramOptions ctor into pytest raises
ksimpson-work Jan 2, 2025
d43500f
Merge branch 'main' into ksimpson/add_program_options
ksimpson-work Jan 2, 2025
0f0ca9b
only import nvjitlink if its available
ksimpson-work Jan 2, 2025
47c416d
Merge remote-tracking branch 'origin/ksimpson/add_program_options' in…
ksimpson-work Jan 2, 2025
c41821f
Merge branch 'main' into ksimpson/add_program_options
ksimpson-work Jan 2, 2025
53b8198
Merge branch 'main' into ksimpson/add_program_options
ksimpson-work Jan 2, 2025
58f2b09
Update cuda_core/examples/saxpy.py
ksimpson-work Jan 3, 2025
7afe54e
Update cuda_core/cuda/core/experimental/_program.py
ksimpson-work Jan 3, 2025
653a3e1
Update cuda_core/cuda/core/experimental/_program.py
ksimpson-work Jan 3, 2025
5161a43
Update cuda_core/cuda/core/experimental/_program.py
ksimpson-work Jan 3, 2025
dfd894e
Merge branch 'main' into ksimpson/add_program_options
ksimpson-work Jan 3, 2025
f461eec
tweak doc source
ksimpson-work Jan 3, 2025
236db71
tweak docs
ksimpson-work Jan 3, 2025
2f74ca3
tweak fix
ksimpson-work Jan 3, 2025
710d8e7
Merge branch 'main' into ksimpson/add_program_options
leofang Jan 5, 2025
9c88ba7
Update cuda_core/cuda/core/experimental/_linker.py
ksimpson-work Jan 6, 2025
133f6aa
Update cuda_core/docs/source/release.rst
ksimpson-work Jan 6, 2025
cb06afc
Update cuda_core/docs/source/release/0.1.0-notes.rst
ksimpson-work Jan 6, 2025
3578f94
fix tests
ksimpson-work Jan 6, 2025
261588c
fix quotes
ksimpson-work Jan 6, 2025
2b9e94a
Merge remote-tracking branch 'origin/ksimpson/add_program_options' in…
ksimpson-work Jan 6, 2025
1abc9f6
remove print
ksimpson-work Jan 6, 2025
d20dcfa
Update cuda_core/cuda/core/experimental/_utils.py
ksimpson-work Jan 6, 2025
79fad7a
fix titles
ksimpson-work Jan 6, 2025
b87044b
Merge remote-tracking branch 'origin/ksimpson/add_program_options' in…
ksimpson-work Jan 6, 2025
6959689
Merge branch 'main' into ksimpson/add_program_options
ksimpson-work Jan 6, 2025
ec9fac1
remove some options
ksimpson-work Jan 6, 2025
f55dcdc
add TODO
ksimpson-work Jan 6, 2025
dfe194f
Merge remote-tracking branch 'origin/ksimpson/add_program_options' in…
ksimpson-work Jan 6, 2025
b41d119
remove option
ksimpson-work Jan 6, 2025
de588de
remove options, should pass.
ksimpson-work Jan 6, 2025
2fbca70
add issue tracking info
ksimpson-work Jan 6, 2025
b79dccc
fix include path argument
ksimpson-work Jan 6, 2025
d1e4e09
Merge branch 'main' into ksimpson/add_program_options
ksimpson-work Jan 8, 2025
bf32370
fix the rest format
ksimpson-work Jan 8, 2025
85c0e47
Merge remote-tracking branch 'origin/ksimpson/add_program_options' in…
ksimpson-work Jan 8, 2025
34c8780
handle nested tuples within lists and tuples, and fix the handling of…
ksimpson-work Jan 8, 2025
cc25960
change from sequence to list or tuple
ksimpson-work Jan 8, 2025
e4786b2
Update cuda_core/cuda/core/experimental/_program.py
ksimpson-work Jan 8, 2025
d913e0a
fix quotes
ksimpson-work Jan 8, 2025
06c68b4
fix quotes
ksimpson-work Jan 8, 2025
3a747f6
swap api order
ksimpson-work Jan 8, 2025
b1345f8
Merge branch 'main' into ksimpson/add_program_options
ksimpson-work Jan 9, 2025
5dc8c87
Merge branch 'main' into ksimpson/add_program_options
ksimpson-work Jan 11, 2025
446b5af
Merge branch 'main' into ksimpson/add_program_options
ksimpson-work Jan 13, 2025
9ca2f40
switch the order fo comparisons to use sequence instead of list / tuple
ksimpson-work Jan 13, 2025
a710b45
fix order and debug code
ksimpson-work Jan 13, 2025
3736541
Merge branch 'main' into ksimpson/add_program_options
ksimpson-work Jan 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cuda_core/cuda/core/experimental/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
from cuda.core.experimental._device import Device
from cuda.core.experimental._event import EventOptions
from cuda.core.experimental._launcher import LaunchConfig, launch
from cuda.core.experimental._program import Program, ProgramOptions
from cuda.core.experimental._linker import Linker, LinkerOptions
from cuda.core.experimental._program import Program
from cuda.core.experimental._stream import Stream, StreamOptions
from cuda.core.experimental._system import System

Expand Down
13 changes: 8 additions & 5 deletions cuda_core/cuda/core/experimental/_linker.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from typing import List, Optional

from cuda import cuda
from cuda.core.experimental._device import Device
from cuda.core.experimental._module import ObjectCode
from cuda.core.experimental._utils import check_or_create_options, handle_return

Expand Down Expand Up @@ -92,10 +93,10 @@ class LinkerOptions:

Attributes
----------
arch : str
Pass the SM architecture value, such as ``-arch=sm_<CC>`` (for generating CUBIN) or
``compute_<CC>`` (for generating PTX).
This is a required option.
arch : str, optional
Pass the SM architecture value, such as ``sm_<CC>`` (for generating CUBIN) or
``compute_<CC>`` (for generating PTX). If not provided, the current device's architecture
will be used.
max_register_count : int, optional
Maximum register count.
Maps to: ``-maxrregcount=<N>``.
Expand Down Expand Up @@ -173,7 +174,7 @@ class LinkerOptions:
Default: False.
"""

arch: str
arch: Optional[str] = None
max_register_count: Optional[int] = None
time: Optional[bool] = None
verbose: Optional[bool] = None
Expand Down Expand Up @@ -205,6 +206,8 @@ def __post_init__(self):
def _init_nvjitlink(self):
if self.arch is not None:
self.formatted_options.append(f"-arch={self.arch}")
else:
self.formatted_options.append("-arch=sm_" + "".join(f"{i}" for i in Device().compute_capability))
if self.max_register_count is not None:
self.formatted_options.append(f"-maxrregcount={self.max_register_count}")
if self.time is not None:
Expand Down
381 changes: 370 additions & 11 deletions cuda_core/cuda/core/experimental/_program.py

Large diffs are not rendered by default.

22 changes: 22 additions & 0 deletions cuda_core/cuda/core/experimental/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import functools
import importlib.metadata
from collections import namedtuple
from collections.abc import Sequence
from typing import Callable, Dict

from cuda import cuda, cudart, nvrtc
Expand Down Expand Up @@ -88,6 +89,13 @@ def check_or_create_options(cls, options, options_description, *, keep_none=Fals
return options


def _handle_boolean_option(option: bool) -> str:
"""
Convert a boolean option to a string representation.
"""
return "true" if bool(option) else "false"


def precondition(checker: Callable[..., None], what: str = "") -> Callable:
"""
A decorator that adds checks to ensure any preconditions are met.
Expand Down Expand Up @@ -137,6 +145,20 @@ def get_device_from_ctx(ctx_handle) -> int:
return device_id


def is_sequence(obj):
"""
Check if the given object is a sequence (list or tuple).
"""
return isinstance(obj, Sequence)


def is_nested_sequence(obj):
"""
Check if the given object is a nested sequence (list or tuple with atleast one list or tuple element).
"""
return is_sequence(obj) and any(is_sequence(elem) for elem in obj)


def get_binding_version():
try:
major_minor = importlib.metadata.version("cuda-bindings").split(".")[:2]
Expand Down
1 change: 1 addition & 0 deletions cuda_core/docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ CUDA compilation toolchain

:template: dataclass.rst

ProgramOptions
LinkerOptions


Expand Down
2 changes: 1 addition & 1 deletion cuda_core/docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ and other functionalities.
:maxdepth: 2
:caption: Contents:

release.md
release.rst
install.md
interoperability.rst
api.rst
Expand Down
11 changes: 0 additions & 11 deletions cuda_core/docs/source/release.md

This file was deleted.

9 changes: 9 additions & 0 deletions cuda_core/docs/source/release.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Release Notes
=============

.. toctree::
:maxdepth: 3

release/0.2.0-notes
release/0.1.1-notes
release/0.1.0-notes
Original file line number Diff line number Diff line change
@@ -1,17 +1,21 @@
# `cuda.core` v0.1.0 Release notes
``cuda.core`` 0.1.0 Release Notes
=================================

Released on Nov 8, 2024

## Hightlights
Highlights
----------

- Initial beta release
- Supports all platforms that CUDA is supported
- Supports all CUDA 11.x/12.x drivers
- Supports all CUDA 11.x/12.x Toolkits
- Pythonic CUDA runtime and other core functionalities

## Limitations
Limitations
-----------

- All APIs are currently *experimental* and subject to change without deprecation notice.
Please kindly share your feedbacks with us so that we can make `cuda.core` better!
Please kindly share your feedback with us so that we can make ``cuda.core`` better!
- Source code release only; `pip`/`conda` support is coming in a future release
- Windows TCC mode is [not yet supported](https://github.com/NVIDIA/cuda-python/issues/206)
- Windows TCC mode is `not yet supported <https://github.com/NVIDIA/cuda-python/issues/206>`_
43 changes: 0 additions & 43 deletions cuda_core/docs/source/release/0.1.1-notes.md

This file was deleted.

51 changes: 51 additions & 0 deletions cuda_core/docs/source/release/0.1.1-notes.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
.. currentmodule:: cuda.core.experimental

``cuda.core`` 0.1.1 Release Notes
=================================

Released on Dec 20, 2024

Highlights
----------

- Add :obj:`~utils.StridedMemoryView` and :func:`~utils.args_viewable_as_strided_memory` that provide a concrete
implementation of DLPack & CUDA Array Interface supports.
- Add :obj:`~Linker` that can link one or multiple :obj:`~_module.ObjectCode` instances generated by :obj:`~Program`. Under
the hood, it uses either the nvJitLink or driver (``cuLink*``) APIs depending on the CUDA version
detected in the current environment.
- Support ``pip install cuda-core``. Please see the Installation Guide for further details.

New features
------------

- Add a :obj:`cuda.core.experiemental.system` module for querying system- or process-wide information.
- Add :obj:`~LaunchConfig.cluster` to support thread block clusters on Hopper GPUs.

Enhancements
------------

- The internal handle held by :obj:`~_module.ObjectCode` is now lazily initialized upon first touch.
- Support TCC devices with a default synchronous memory resource to avoid the use of memory pools.
- Ensure ``"ltoir"`` is a valid code type to :obj:`~_module.ObjectCode`.
- Document the ``__cuda_stream__`` protocol.
- Improve test coverage & documentation cross-references.
- Enforce code formatting.

Bug fixes
---------

- Eliminate potential class destruction issues.
- Fix circular import during handling a foreign CUDA stream.

Limitations
-----------

- All APIs are currently *experimental* and subject to change without deprecation notice.
Please kindly share your feedback with us so that we can make ``cuda.core`` better!
- Using ``cuda.core`` with NVRTC or nvJitLink installed from PyPI via `pip install` is currently
not supported. This will be fixed in a future release.
- Some :class:`~LinkerOptions` are only available when using a modern version of CUDA. When using CUDA <12,
the backend is the cuLink API which supports only a subset of the options that nvjitlink does.
Further, some options aren't available on CUDA versions <12.6.
- To use ``cuda.core`` with Python 3.13, it currently requires building ``cuda-python`` from source
prior to `pip install`. This extra step will be fixed soon.
21 changes: 21 additions & 0 deletions cuda_core/docs/source/release/0.2.0-notes.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
.. currentmodule:: cuda.core.experimental

``cuda.core`` 0.2.0 Release Notes
=================================

Released on <TODO>, 2024

Highlights
----------

- Add :class:`~ProgramOptions` to facilitate the passing of runtime compile options to :obj:`~Program`.

Limitations
-----------

- <TODO>

Breaking Changes
----------------

- The :meth:`~Program.compile` method no longer accepts the `options` argument. Instead, you can optionally pass an instance of :class:`~ProgramOptions` to the constructor of :obj:`~Program`.
10 changes: 4 additions & 6 deletions cuda_core/examples/saxpy.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

import cupy as cp

from cuda.core.experimental import Device, LaunchConfig, Program, launch
from cuda.core.experimental import Device, LaunchConfig, Program, ProgramOptions, launch

# compute out = a * x + y
code = """
Expand All @@ -29,13 +29,11 @@
s = dev.create_stream()

# prepare program
prog = Program(code, code_type="c++")
arch = "".join(f"{i}" for i in dev.compute_capability)
program_options = ProgramOptions(std="c++11", arch=f"sm_{arch}")
prog = Program(code, code_type="c++", options=program_options)
mod = prog.compile(
"cubin",
options=(
"-std=c++11",
"-arch=sm_" + "".join(f"{i}" for i in dev.compute_capability),
),
logs=sys.stdout,
name_expressions=("saxpy<float>", "saxpy<double>"),
)
Expand Down
11 changes: 4 additions & 7 deletions cuda_core/examples/strided_memory_view.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
cp = None
import numpy as np

from cuda.core.experimental import Device, LaunchConfig, Program, launch
from cuda.core.experimental import Device, LaunchConfig, Program, ProgramOptions, launch
from cuda.core.experimental.utils import StridedMemoryView, args_viewable_as_strided_memory

# ################################################################################
Expand Down Expand Up @@ -88,16 +88,13 @@
}
}
""").substitute(func_sig=func_sig)
gpu_prog = Program(gpu_code, code_type="c++")

# To know the GPU's compute capability, we need to identify which GPU to use.
dev = Device(0)
dev.set_current()
arch = "".join(f"{i}" for i in dev.compute_capability)
mod = gpu_prog.compile(
target_type="cubin",
# TODO: update this after NVIDIA/cuda-python#237 is merged
options=(f"-arch=sm_{arch}", "-std=c++11"),
)
gpu_prog = Program(gpu_code, code_type="c++", options=ProgramOptions(arch=f"sm_{arch}", std="c++11"))
mod = gpu_prog.compile(target_type="cubin")
gpu_ker = mod.get_kernel(func_name)

# Now we are prepared to run the code from the user's perspective!
Expand Down
12 changes: 6 additions & 6 deletions cuda_core/examples/thread_block_cluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import os
import sys

from cuda.core.experimental import Device, LaunchConfig, Program, launch
from cuda.core.experimental import Device, LaunchConfig, Program, ProgramOptions, launch

# prepare include
cuda_path = os.environ.get("CUDA_PATH", os.environ.get("CUDA_HOME"))
Expand Down Expand Up @@ -44,12 +44,12 @@

# prepare program & compile kernel
dev.set_current()
prog = Program(code, code_type="c++")
mod = prog.compile(
target_type="cubin",
# TODO: update this after NVIDIA/cuda-python#237 is merged
options=(f"-arch=sm_{arch}", "-std=c++17", f"-I{cuda_include_path}"),
prog = Program(
code,
code_type="c++",
options=ProgramOptions(arch=f"sm_{arch}", std="c++17", include_path=cuda_include_path),
)
mod = prog.compile(target_type="cubin")
ker = mod.get_kernel("check_cluster_info")

# prepare launch config
Expand Down
14 changes: 4 additions & 10 deletions cuda_core/examples/vector_add.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

import cupy as cp

from cuda.core.experimental import Device, LaunchConfig, Program, launch
from cuda.core.experimental import Device, LaunchConfig, Program, ProgramOptions, launch

# compute c = a + b
code = """
Expand All @@ -26,15 +26,9 @@
s = dev.create_stream()

# prepare program
prog = Program(code, code_type="c++")
mod = prog.compile(
"cubin",
options=(
"-std=c++17",
"-arch=sm_" + "".join(f"{i}" for i in dev.compute_capability),
),
name_expressions=("vector_add<float>",),
)
program_options = ProgramOptions(std="c++17", arch="sm_" + "".join(f"{i}" for i in dev.compute_capability))
prog = Program(code, code_type="c++", options=program_options)
mod = prog.compile("cubin", name_expressions=("vector_add<float>",))

# run in single precision
ker = mod.get_kernel("vector_add<float>")
Expand Down
Loading
Loading