v0.5.3: Asynch memory ops, NVRTC compilation improvements
Changes since v0.5.2:
Runtime program compilation (NVRTC) improvements
- #379: Can get the compilation log, PTX, cubin or NVVM in a user-provided rather than self-allocated buffer
- #388: A builder interface for NVRTC programs
- #386: Add support for nvrtcGetSupportedArchs()
- #375: Support adding arbitrary options when dynamically compiling a CUDA program
- #265: Support for diag-suppress/error/warn compilation options
Runtime-compilation-related Bug fixes
- #391: Fix for a CUDA 10.0 support regression
- #384: Make nvrtc depend on runtime-and-driver
- #376: When rendering compilation options to a string, we get an extra space
- #378: Compilation log vector contains trailing '\0'
- #387:
nvrtc.h
included in wrong file
Other changes
- #390: Avoiding a memory leak when getting a CUDA device's name
- #248: Support asynchronous memory allocation (in v0.5.2 we only had allocation, no freeing)
Caveats
Continuous build testing on Windows is failing on GitHub Actions due to trouble with CMake detecting the NVTX path. Assistance from users on this matter would be appreciated.