-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuda.parallel: In-memory caching of cuda.parallel
build objects
#3216
Changes from 1 commit
cb0eccc
da652e1
221af5c
0eea142
f198200
56f2c61
c822da7
58b6f69
d017572
68235ee
f892265
5e3d95a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED. | ||
# | ||
# | ||
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
|
||
""" | ||
Utilities for extracting information from `__cuda_array_interface__`. | ||
""" | ||
|
||
import numpy as np | ||
|
||
from ..typing import DeviceArrayLike | ||
|
||
|
||
def get_dtype(arr: DeviceArrayLike) -> np.dtype: | ||
return np.dtype(arr.__cuda_array_interface__["typestr"]) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,6 +13,9 @@ | |
from .. import _cccl as cccl | ||
from .._bindings import get_paths, get_bindings | ||
from .._caching import cache_with_key | ||
from ..typing import DeviceArrayLike | ||
from ..iterators._iterators import IteratorBase | ||
from .._utils import cai as cai | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oversight? (delete There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed in 56f2c61. |
||
|
||
|
||
class _Op: | ||
|
@@ -42,12 +45,18 @@ def _dtype_validation(dt1, dt2): | |
|
||
class _Reduce: | ||
# TODO: constructor shouldn't require concrete `d_in`, `d_out`: | ||
def __init__(self, d_in, d_out, op: Callable, h_init: np.ndarray): | ||
def __init__( | ||
self, | ||
d_in: DeviceArrayLike | IteratorBase, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note to self: Python 3.7 isn't going to like this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you reminding yourself to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead, I used |
||
d_out: DeviceArrayLike, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remark: later on we'll have to support output iterators, like tabulate output iterator and transform output iterator |
||
op: Callable, | ||
h_init: np.ndarray, | ||
): | ||
d_in_cccl = cccl.to_cccl_iter(d_in) | ||
self._ctor_d_in_cccl_type_enum_name = cccl.type_enum_as_name( | ||
d_in_cccl.value_type.type.value | ||
) | ||
self._ctor_d_out_dtype = d_out.dtype | ||
self._ctor_d_out_dtype = cai.get_dtype(d_out) | ||
self._ctor_init_dtype = h_init.dtype | ||
cc_major, cc_minor = cuda.get_current_device().compute_capability | ||
cub_path, thrust_path, libcudacxx_path, cuda_include_path = get_paths() | ||
|
@@ -120,9 +129,14 @@ def __del__(self): | |
bindings.cccl_device_reduce_cleanup(ctypes.byref(self.build_result)) | ||
|
||
|
||
def make_cache_key(d_in, d_out, op, h_init): | ||
d_in_key = d_in.dtype if hasattr(d_in, "__cuda_array_interface__") else d_in | ||
d_out_key = d_out.dtype if hasattr(d_out, "__cuda_array_interface__") else d_out | ||
def make_cache_key( | ||
d_in: DeviceArrayLike | IteratorBase, | ||
d_out: DeviceArrayLike, | ||
op: Callable, | ||
h_init: np.ndarray, | ||
): | ||
d_in_key = d_in if isinstance(d_in, IteratorBase) else cai.get_dtype(d_in) | ||
d_out_key = d_out if isinstance(d_out, IteratorBase) else cai.get_dtype(d_out) | ||
op_key = (op.__code__.co_code, op.__code__.co_consts, op.__closure__) | ||
h_init_key = h_init.dtype | ||
return (d_in_key, d_out_key, op_key, h_init_key) | ||
|
@@ -131,7 +145,12 @@ def make_cache_key(d_in, d_out, op, h_init): | |
# TODO Figure out `sum` without operator and initial value | ||
# TODO Accept stream | ||
@cache_with_key(make_cache_key) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe irrelevant of this PR, just for my curiosity: What if d_in/d_out are discontinuous 1D arrays? Do we handle the stride somewhere? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No - it's a good point. I believe we need these to be contiguous. Opened #3223 and will address this in a follow-up. |
||
def reduce_into(d_in, d_out, op: Callable, h_init: np.ndarray): | ||
def reduce_into( | ||
d_in: DeviceArrayLike | IteratorBase, | ||
d_out: DeviceArrayLike, | ||
op: Callable, | ||
h_init: np.ndarray, | ||
): | ||
"""Computes a device-wide reduction using the specified binary ``op`` functor and initial value ``init``. | ||
|
||
Example: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
from typing import Protocol | ||
|
||
|
||
class DeviceArrayLike(Protocol): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
would be much more expressive. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FYI, StriderMemoryView from cuda.core encapsulates both CAI and DLPack. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think The former is a library-neutral implementation of the CAI/dlpack "protocols", while the latter is a true Protocol (same word, two subtly different meanings).
|
||
""" | ||
Objects representing a device array, having a `.__cuda_array_interface__` | ||
attribute. | ||
""" | ||
|
||
__cuda_array_interface__: dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI this can be replaced by StridedMemoryView once cuda.core becomes a dependency.