Skip to content

v0.5.3: Asynch memory ops, NVRTC compilation improvements

Compare
Choose a tag to compare
@eyalroz eyalroz released this 26 Jul 07:55
· 378 commits to development since this release

Changes since v0.5.2:

Runtime program compilation (NVRTC) improvements

  • #379: Can get the compilation log, PTX, cubin or NVVM in a user-provided rather than self-allocated buffer
  • #388: A builder interface for NVRTC programs
  • #386: Add support for nvrtcGetSupportedArchs()
  • #375: Support adding arbitrary options when dynamically compiling a CUDA program
  • #265: Support for diag-suppress/error/warn compilation options

Runtime-compilation-related Bug fixes

  • #391: Fix for a CUDA 10.0 support regression
  • #384: Make nvrtc depend on runtime-and-driver
  • #376: When rendering compilation options to a string, we get an extra space
  • #378: Compilation log vector contains trailing '\0'
  • #387: nvrtc.h included in wrong file

Other changes

  • #390: Avoiding a memory leak when getting a CUDA device's name
  • #248: Support asynchronous memory allocation (in v0.5.2 we only had allocation, no freeing)

Caveats

Continuous build testing on Windows is failing on GitHub Actions due to trouble with CMake detecting the NVTX path. Assistance from users on this matter would be appreciated.