Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review CUB util.ptx for CCCL 2.x #3342

Merged
merged 18 commits into from
Jan 15, 2025
Merged

Conversation

fbusato
Copy link
Contributor

@fbusato fbusato commented Jan 10, 2025

#3289

Description

  • Add deprecation warnings to:
    • CUDA special register usage
    • BFI, IADD3, PRMT, BAR, FMUL_RZ, FFMA_RZ, ThreadTrap because never used
  • Replace PTX shf/shl (detail namespace) with standard shift operations, which could be beneficial for NVVM optimizations

@fbusato fbusato requested review from a team as code owners January 10, 2025 20:36
Copy link
Contributor

@bernhardmgruber bernhardmgruber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for the shift and a typo, LGTM

cub/cub/warp/specializations/warp_reduce_shfl.cuh Outdated Show resolved Hide resolved
cub/test/test_util.h Outdated Show resolved Hide resolved
cub/cub/agent/agent_batch_memcpy.cuh Outdated Show resolved Hide resolved
@fbusato fbusato self-assigned this Jan 11, 2025
Copy link
Contributor

🟨 CI finished in 3h 04m: Pass: 97%/78 | Total: 2d 06h | Avg: 41m 41s | Max: 1h 37m | Hits: 216%/12368
  • 🟨 cub: Pass: 94%/38 | Total: 1d 09h | Avg: 53m 05s | Max: 1h 37m | Hits: 219%/3108

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  94%/36  | Total:  1d 07h | Avg: 52m 55s | Max:  1h 37m | Hits: 219%/3108  
      🟩 arm64              Pass: 100%/2   | Total:  1h 52m | Avg: 56m 10s | Max: 57m 13s
    🔍 ctk: 12.6 🔍
      🟩 12.0               Pass: 100%/5   | Total:  4h 53m | Avg: 58m 38s | Max:  1h 07m | Hits: 220%/777   
      🟩 12.5               Pass: 100%/2   | Total:  2h 19m | Avg:  1h 09m | Max:  1h 10m
      🔍 12.6               Pass:  93%/31  | Total:  1d 02h | Avg: 51m 07s | Max:  1h 37m | Hits: 219%/2331  
    🔍 cudacxx: nvcc12.6 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 57m | Avg: 58m 34s | Max: 58m 58s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 53m | Avg: 58m 38s | Max:  1h 07m | Hits: 220%/777   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 19m | Avg:  1h 09m | Max:  1h 10m
      🔍 nvcc12.6           Pass:  93%/29  | Total:  1d 00h | Avg: 50m 36s | Max:  1h 37m | Hits: 219%/2331  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 57m | Avg: 58m 34s | Max: 58m 58s
      🔍 nvcc               Pass:  94%/36  | Total:  1d 07h | Avg: 52m 47s | Max:  1h 37m | Hits: 219%/3108  
    🔍 gpu: v100 🔍
      🟩 h100               Pass: 100%/2   | Total: 43m 32s | Avg: 21m 46s | Max: 27m 24s
      🔍 v100               Pass:  94%/36  | Total:  1d 08h | Avg: 54m 50s | Max:  1h 37m | Hits: 219%/3108  
    🚨 jobs: TestGPU 🚨
      🟩 Build              Pass: 100%/31  | Total:  1d 05h | Avg: 57m 10s | Max:  1h 12m | Hits: 219%/3108  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 27m 21s | Avg: 27m 21s | Max: 27m 21s
      🟩 GraphCapture       Pass: 100%/1   | Total: 19m 12s | Avg: 19m 12s | Max: 19m 12s
      🟩 HostLaunch         Pass: 100%/3   | Total:  2h 26m | Avg: 48m 45s | Max:  1h 37m
      🔥 TestGPU            Pass:   0%/2   | Total: 52m 23s | Avg: 26m 11s | Max: 28m 20s
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/14  | Total: 14h 12m | Avg:  1h 00m | Max:  1h 12m | Hits: 220%/2331  
      🔍 20                 Pass:  91%/24  | Total: 19h 24m | Avg: 48m 32s | Max:  1h 37m | Hits: 216%/777   
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 41m | Avg: 55m 20s | Max: 57m 15s
      🟩 Clang15            Pass: 100%/1   | Total: 59m 04s | Avg: 59m 04s | Max: 59m 04s
      🟩 Clang16            Pass: 100%/1   | Total: 53m 08s | Avg: 53m 08s | Max: 53m 08s
      🟩 Clang17            Pass: 100%/1   | Total: 55m 47s | Avg: 55m 47s | Max: 55m 47s
      🟨 Clang18            Pass:  85%/7   | Total:  5h 37m | Avg: 48m 11s | Max: 58m 58s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 50m | Avg: 55m 28s | Max: 56m 58s
      🟩 GCC8               Pass: 100%/1   | Total: 58m 41s | Avg: 58m 41s | Max: 58m 41s
      🟩 GCC9               Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m
      🟩 GCC10              Pass: 100%/1   | Total: 55m 24s | Avg: 55m 24s | Max: 55m 24s
      🟩 GCC11              Pass: 100%/1   | Total: 55m 59s | Avg: 55m 59s | Max: 55m 59s
      🟩 GCC12              Pass: 100%/3   | Total:  1h 46m | Avg: 35m 25s | Max:  1h 02m
      🟨 GCC13              Pass:  87%/8   | Total:  6h 09m | Avg: 46m 10s | Max:  1h 37m
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 13m | Avg:  1h 06m | Max:  1h 07m | Hits: 220%/1554  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 12m | Hits: 218%/1554  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 19m | Avg:  1h 09m | Max:  1h 10m
    🟨 cxx_family
      🟨 Clang              Pass:  92%/14  | Total: 12h 06m | Avg: 51m 54s | Max: 59m 04s
      🟨 GCC                Pass:  94%/18  | Total: 14h 37m | Avg: 48m 43s | Max:  1h 37m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 34m | Avg:  1h 08m | Max:  1h 12m | Hits: 219%/3108  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 19m | Avg:  1h 09m | Max:  1h 10m
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 43m 32s | Avg: 21m 46s | Max: 27m 24s
      🟩 90a                Pass: 100%/1   | Total: 23m 17s | Avg: 23m 17s | Max: 23m 17s
    
  • 🟩 thrust: Pass: 100%/37 | Total: 19h 52m | Avg: 32m 13s | Max: 1h 06m | Hits: 215%/9260

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 37m 27s | Avg: 18m 43s | Max: 25m 30s
    🟩 cpu
      🟩 amd64              Pass: 100%/35  | Total: 18h 55m | Avg: 32m 25s | Max:  1h 06m | Hits: 215%/9260  
      🟩 arm64              Pass: 100%/2   | Total: 57m 14s | Avg: 28m 37s | Max: 30m 32s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 04m | Avg: 36m 55s | Max: 59m 04s | Hits: 178%/1852  
      🟩 12.5               Pass: 100%/2   | Total:  1h 48m | Avg: 54m 29s | Max: 54m 31s
      🟩 12.6               Pass: 100%/30  | Total: 14h 58m | Avg: 29m 57s | Max:  1h 06m | Hits: 225%/7408  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 50m 51s | Avg: 25m 25s | Max: 25m 53s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 04m | Avg: 36m 55s | Max: 59m 04s | Hits: 178%/1852  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 48m | Avg: 54m 29s | Max: 54m 31s
      🟩 nvcc12.6           Pass: 100%/28  | Total: 14h 07m | Avg: 30m 16s | Max:  1h 06m | Hits: 225%/7408  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 50m 51s | Avg: 25m 25s | Max: 25m 53s
      🟩 nvcc               Pass: 100%/35  | Total: 19h 01m | Avg: 32m 36s | Max:  1h 06m | Hits: 215%/9260  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  1h 58m | Avg: 29m 40s | Max: 31m 54s
      🟩 Clang15            Pass: 100%/1   | Total: 33m 44s | Avg: 33m 44s | Max: 33m 44s
      🟩 Clang16            Pass: 100%/1   | Total: 29m 35s | Avg: 29m 35s | Max: 29m 35s
      🟩 Clang17            Pass: 100%/1   | Total: 29m 37s | Avg: 29m 37s | Max: 29m 37s
      🟩 Clang18            Pass: 100%/7   | Total:  2h 46m | Avg: 23m 48s | Max: 30m 51s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 05m | Avg: 32m 33s | Max: 32m 37s
      🟩 GCC8               Pass: 100%/1   | Total: 29m 35s | Avg: 29m 35s | Max: 29m 35s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 05m | Avg: 32m 43s | Max: 33m 31s
      🟩 GCC10              Pass: 100%/1   | Total: 33m 05s | Avg: 33m 05s | Max: 33m 05s
      🟩 GCC11              Pass: 100%/1   | Total: 31m 10s | Avg: 31m 10s | Max: 31m 10s
      🟩 GCC12              Pass: 100%/1   | Total: 33m 09s | Avg: 33m 09s | Max: 33m 09s
      🟩 GCC13              Pass: 100%/8   | Total:  2h 52m | Avg: 21m 36s | Max: 34m 11s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 53m | Avg: 56m 36s | Max: 59m 04s | Hits: 178%/3704  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 41m | Avg: 53m 47s | Max:  1h 06m | Hits: 240%/5556  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 48m | Avg: 54m 29s | Max: 54m 31s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total:  6h 18m | Avg: 27m 01s | Max: 33m 44s
      🟩 GCC                Pass: 100%/16  | Total:  7h 10m | Avg: 26m 54s | Max: 34m 11s
      🟩 MSVC               Pass: 100%/5   | Total:  4h 34m | Avg: 54m 54s | Max:  1h 06m | Hits: 215%/9260  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 48m | Avg: 54m 29s | Max: 54m 31s
    🟩 gpu
      🟩 v100               Pass: 100%/37  | Total: 19h 52m | Avg: 32m 13s | Max:  1h 06m | Hits: 215%/9260  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total: 18h 12m | Avg: 35m 14s | Max:  1h 06m | Hits: 178%/7408  
      🟩 TestCPU            Pass: 100%/3   | Total: 51m 59s | Avg: 17m 19s | Max: 36m 43s | Hits: 365%/1852  
      🟩 TestGPU            Pass: 100%/3   | Total: 47m 45s | Avg: 15m 55s | Max: 22m 17s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 18m 18s | Avg: 18m 18s | Max: 18m 18s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total:  8h 52m | Avg: 38m 02s | Max: 59m 04s | Hits: 178%/5556  
      🟩 20                 Pass: 100%/21  | Total: 10h 22m | Avg: 29m 38s | Max:  1h 06m | Hits: 272%/3704  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 13m 25s | Avg: 6m 42s | Max: 11m 15s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 13m 25s | Avg:  6m 42s | Max: 11m 15s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 13m 25s | Avg:  6m 42s | Max: 11m 15s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 13m 25s | Avg:  6m 42s | Max: 11m 15s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 13m 25s | Avg:  6m 42s | Max: 11m 15s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 13m 25s | Avg:  6m 42s | Max: 11m 15s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 13m 25s | Avg:  6m 42s | Max: 11m 15s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 13m 25s | Avg:  6m 42s | Max: 11m 15s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 10s | Avg:  2m 10s | Max:  2m 10s
      🟩 Test               Pass: 100%/1   | Total: 11m 15s | Avg: 11m 15s | Max: 11m 15s
    
  • 🟩 python: Pass: 100%/1 | Total: 28m 25s | Avg: 28m 25s | Max: 28m 25s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 28m 25s | Avg: 28m 25s | Max: 28m 25s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 28m 25s | Avg: 28m 25s | Max: 28m 25s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 28m 25s | Avg: 28m 25s | Max: 28m 25s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 28m 25s | Avg: 28m 25s | Max: 28m 25s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 28m 25s | Avg: 28m 25s | Max: 28m 25s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 28m 25s | Avg: 28m 25s | Max: 28m 25s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 28m 25s | Avg: 28m 25s | Max: 28m 25s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 28m 25s | Avg: 28m 25s | Max: 28m 25s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 78)

# Runner
53 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

@fbusato fbusato force-pushed the review-cub-util-ptx branch from 96cf032 to 8d44adb Compare January 13, 2025 23:07
@fbusato fbusato requested a review from a team as a code owner January 14, 2025 00:50
@fbusato fbusato requested a review from elstehle January 14, 2025 00:50
Copy link
Contributor

🟨 CI finished in 3h 35m: Pass: 97%/78 | Total: 2d 04h | Avg: 40m 43s | Max: 1h 15m | Hits: 189%/12340
  • 🟨 cub: Pass: 94%/38 | Total: 1d 08h | Avg: 51m 25s | Max: 1h 15m | Hits: 112%/3120

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  94%/36  | Total:  1d 06h | Avg: 51m 01s | Max:  1h 15m | Hits: 112%/3120  
      🟩 arm64              Pass: 100%/2   | Total:  1h 57m | Avg: 58m 42s | Max: 58m 43s
    🔍 ctk: 12.6 🔍
      🟩 12.0               Pass: 100%/5   | Total:  4h 59m | Avg: 59m 48s | Max:  1h 09m | Hits: 113%/780   
      🟩 12.5               Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 05m
      🔍 12.6               Pass:  93%/31  | Total:  1d 01h | Avg: 49m 10s | Max:  1h 15m | Hits: 112%/2340  
    🔍 cudacxx: nvcc12.6 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 04m
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 59m | Avg: 59m 48s | Max:  1h 09m | Hits: 113%/780   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 05m
      🔍 nvcc12.6           Pass:  93%/29  | Total: 23h 18m | Avg: 48m 13s | Max:  1h 15m | Hits: 112%/2340  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 04m
      🔍 nvcc               Pass:  94%/36  | Total:  1d 06h | Avg: 50m 47s | Max:  1h 15m | Hits: 112%/3120  
    🔍 gpu: v100 🔍
      🟩 h100               Pass: 100%/2   | Total: 43m 22s | Avg: 21m 41s | Max: 27m 18s
      🔍 v100               Pass:  94%/36  | Total:  1d 07h | Avg: 53m 05s | Max:  1h 15m | Hits: 112%/3120  
    🚨 jobs: TestGPU 🚨
      🟩 Build              Pass: 100%/31  | Total:  1d 06h | Avg: 58m 56s | Max:  1h 15m | Hits: 112%/3120  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 19m 07s | Avg: 19m 07s | Max: 19m 07s
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 19s | Avg: 16m 19s | Max: 16m 19s
      🟩 HostLaunch         Pass: 100%/3   | Total: 53m 08s | Avg: 17m 42s | Max: 18m 44s
      🔥 TestGPU            Pass:   0%/2   | Total: 38m 26s | Avg: 19m 13s | Max: 20m 20s
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/14  | Total: 14h 11m | Avg:  1h 00m | Max:  1h 12m | Hits: 113%/2340  
      🔍 20                 Pass:  91%/24  | Total: 18h 23m | Avg: 45m 57s | Max:  1h 15m | Hits: 110%/780   
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 57m | Avg: 59m 16s | Max:  1h 02m
      🟩 Clang15            Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
      🟩 Clang16            Pass: 100%/1   | Total: 59m 45s | Avg: 59m 45s | Max: 59m 45s
      🟩 Clang17            Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
      🟨 Clang18            Pass:  85%/7   | Total:  5h 37m | Avg: 48m 13s | Max:  1h 04m
      🟩 GCC7               Pass: 100%/2   | Total:  1h 58m | Avg: 59m 07s | Max: 59m 19s
      🟩 GCC8               Pass: 100%/1   | Total: 54m 32s | Avg: 54m 32s | Max: 54m 32s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 53m | Avg: 56m 50s | Max: 58m 50s
      🟩 GCC10              Pass: 100%/1   | Total: 59m 10s | Avg: 59m 10s | Max: 59m 10s
      🟩 GCC11              Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
      🟩 GCC12              Pass: 100%/3   | Total:  1h 44m | Avg: 34m 48s | Max:  1h 01m
      🟨 GCC13              Pass:  87%/8   | Total:  4h 29m | Avg: 33m 42s | Max: 58m 43s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 12m | Hits: 113%/1560  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 23m | Avg:  1h 11m | Max:  1h 15m | Hits: 111%/1560  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 05m
    🟨 cxx_family
      🟨 Clang              Pass:  92%/14  | Total: 12h 37m | Avg: 54m 05s | Max:  1h 04m
      🟨 GCC                Pass:  94%/18  | Total: 13h 00m | Avg: 43m 23s | Max:  1h 01m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 45m | Avg:  1h 11m | Max:  1h 15m | Hits: 112%/3120  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 05m
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 43m 22s | Avg: 21m 41s | Max: 27m 18s
      🟩 90a                Pass: 100%/1   | Total: 26m 18s | Avg: 26m 18s | Max: 26m 18s
    
  • 🟩 thrust: Pass: 100%/37 | Total: 19h 46m | Avg: 32m 04s | Max: 1h 00m | Hits: 215%/9220

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 39m 39s | Avg: 19m 49s | Max: 28m 04s
    🟩 cpu
      🟩 amd64              Pass: 100%/35  | Total: 18h 49m | Avg: 32m 16s | Max:  1h 00m | Hits: 215%/9220  
      🟩 arm64              Pass: 100%/2   | Total: 57m 05s | Avg: 28m 32s | Max: 29m 54s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 06m | Avg: 37m 17s | Max: 56m 27s | Hits: 177%/1844  
      🟩 12.5               Pass: 100%/2   | Total:  1h 45m | Avg: 52m 54s | Max: 54m 46s
      🟩 12.6               Pass: 100%/30  | Total: 14h 54m | Avg: 29m 49s | Max:  1h 00m | Hits: 224%/7376  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 53m 30s | Avg: 26m 45s | Max: 27m 25s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 06m | Avg: 37m 17s | Max: 56m 27s | Hits: 177%/1844  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 45m | Avg: 52m 54s | Max: 54m 46s
      🟩 nvcc12.6           Pass: 100%/28  | Total: 14h 01m | Avg: 30m 02s | Max:  1h 00m | Hits: 224%/7376  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 53m 30s | Avg: 26m 45s | Max: 27m 25s
      🟩 nvcc               Pass: 100%/35  | Total: 18h 53m | Avg: 32m 22s | Max:  1h 00m | Hits: 215%/9220  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 04m | Avg: 31m 05s | Max: 33m 12s
      🟩 Clang15            Pass: 100%/1   | Total: 32m 17s | Avg: 32m 17s | Max: 32m 17s
      🟩 Clang16            Pass: 100%/1   | Total: 30m 09s | Avg: 30m 09s | Max: 30m 09s
      🟩 Clang17            Pass: 100%/1   | Total: 32m 42s | Avg: 32m 42s | Max: 32m 42s
      🟩 Clang18            Pass: 100%/7   | Total:  2h 43m | Avg: 23m 17s | Max: 31m 24s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 02m | Avg: 31m 07s | Max: 31m 30s
      🟩 GCC8               Pass: 100%/1   | Total: 31m 25s | Avg: 31m 25s | Max: 31m 25s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 05m | Avg: 32m 52s | Max: 33m 27s
      🟩 GCC10              Pass: 100%/1   | Total: 34m 35s | Avg: 34m 35s | Max: 34m 35s
      🟩 GCC11              Pass: 100%/1   | Total: 32m 23s | Avg: 32m 23s | Max: 32m 23s
      🟩 GCC12              Pass: 100%/1   | Total: 33m 45s | Avg: 33m 45s | Max: 33m 45s
      🟩 GCC13              Pass: 100%/8   | Total:  2h 54m | Avg: 21m 46s | Max: 36m 34s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 51m | Avg: 55m 37s | Max: 56m 27s | Hits: 177%/3688  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 32m | Avg: 50m 58s | Max:  1h 00m | Hits: 240%/5532  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 45m | Avg: 52m 54s | Max: 54m 46s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total:  6h 22m | Avg: 27m 19s | Max: 33m 12s
      🟩 GCC                Pass: 100%/16  | Total:  7h 14m | Avg: 27m 08s | Max: 36m 34s
      🟩 MSVC               Pass: 100%/5   | Total:  4h 24m | Avg: 52m 50s | Max:  1h 00m | Hits: 215%/9220  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 45m | Avg: 52m 54s | Max: 54m 46s
    🟩 gpu
      🟩 v100               Pass: 100%/37  | Total: 19h 46m | Avg: 32m 04s | Max:  1h 00m | Hits: 215%/9220  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total: 18h 16m | Avg: 35m 22s | Max:  1h 00m | Hits: 177%/7376  
      🟩 TestCPU            Pass: 100%/3   | Total: 50m 58s | Avg: 16m 59s | Max: 36m 20s | Hits: 365%/1844  
      🟩 TestGPU            Pass: 100%/3   | Total: 39m 05s | Avg: 13m 01s | Max: 14m 28s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 17m 38s | Avg: 17m 38s | Max: 17m 38s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total:  8h 49m | Avg: 37m 48s | Max: 56m 27s | Hits: 177%/5532  
      🟩 20                 Pass: 100%/21  | Total: 10h 17m | Avg: 29m 25s | Max:  1h 00m | Hits: 271%/3688  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 00s | Avg: 5m 00s | Max: 7m 49s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 10m 00s | Avg:  5m 00s | Max:  7m 49s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 10m 00s | Avg:  5m 00s | Max:  7m 49s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 10m 00s | Avg:  5m 00s | Max:  7m 49s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 10m 00s | Avg:  5m 00s | Max:  7m 49s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 10m 00s | Avg:  5m 00s | Max:  7m 49s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 10m 00s | Avg:  5m 00s | Max:  7m 49s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 10m 00s | Avg:  5m 00s | Max:  7m 49s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 11s | Avg:  2m 11s | Max:  2m 11s
      🟩 Test               Pass: 100%/1   | Total:  7m 49s | Avg:  7m 49s | Max:  7m 49s
    
  • 🟩 python: Pass: 100%/1 | Total: 25m 18s | Avg: 25m 18s | Max: 25m 18s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 25m 18s | Avg: 25m 18s | Max: 25m 18s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 25m 18s | Avg: 25m 18s | Max: 25m 18s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 25m 18s | Avg: 25m 18s | Max: 25m 18s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 25m 18s | Avg: 25m 18s | Max: 25m 18s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 25m 18s | Avg: 25m 18s | Max: 25m 18s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 25m 18s | Avg: 25m 18s | Max: 25m 18s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 25m 18s | Avg: 25m 18s | Max: 25m 18s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 25m 18s | Avg: 25m 18s | Max: 25m 18s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 78)

# Runner
53 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

Copy link

copy-pr-bot bot commented Jan 14, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@fbusato fbusato force-pushed the review-cub-util-ptx branch from ee86833 to 3cf9263 Compare January 14, 2025 18:41
@fbusato fbusato force-pushed the review-cub-util-ptx branch from 3cf9263 to ffe65ca Compare January 14, 2025 19:01
Copy link
Contributor

🟩 CI finished in 2h 36m: Pass: 100%/78 | Total: 2d 05h | Avg: 41m 26s | Max: 1h 16m | Hits: 187%/12340
  • 🟩 cub: Pass: 100%/38 | Total: 1d 08h | Avg: 51m 40s | Max: 1h 16m | Hits: 110%/3120

    🟩 cpu
      🟩 amd64              Pass: 100%/36  | Total:  1d 06h | Avg: 51m 14s | Max:  1h 16m | Hits: 110%/3120  
      🟩 arm64              Pass: 100%/2   | Total:  1h 58m | Avg: 59m 16s | Max:  1h 00m
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  4h 53m | Avg: 58m 46s | Max:  1h 02m | Hits: 110%/780   
      🟩 12.5               Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 08m
      🟩 12.6               Pass: 100%/31  | Total:  1d 01h | Avg: 49m 30s | Max:  1h 16m | Hits: 110%/2340  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 55m | Avg: 57m 48s | Max: 58m 44s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 53m | Avg: 58m 46s | Max:  1h 02m | Hits: 110%/780   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 08m
      🟩 nvcc12.6           Pass: 100%/29  | Total: 23h 39m | Avg: 48m 56s | Max:  1h 16m | Hits: 110%/2340  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 55m | Avg: 57m 48s | Max: 58m 44s
      🟩 nvcc               Pass: 100%/36  | Total:  1d 06h | Avg: 51m 19s | Max:  1h 16m | Hits: 110%/3120  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 48m | Avg: 57m 14s | Max: 58m 46s
      🟩 Clang15            Pass: 100%/1   | Total: 59m 38s | Avg: 59m 38s | Max: 59m 38s
      🟩 Clang16            Pass: 100%/1   | Total: 55m 20s | Avg: 55m 20s | Max: 55m 20s
      🟩 Clang17            Pass: 100%/1   | Total: 55m 59s | Avg: 55m 59s | Max: 55m 59s
      🟩 Clang18            Pass: 100%/7   | Total:  5h 52m | Avg: 50m 24s | Max:  1h 02m
      🟩 GCC7               Pass: 100%/2   | Total:  1h 58m | Avg: 59m 03s | Max: 59m 32s
      🟩 GCC8               Pass: 100%/1   | Total: 54m 01s | Avg: 54m 01s | Max: 54m 01s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 54m | Avg: 57m 17s | Max: 58m 36s
      🟩 GCC10              Pass: 100%/1   | Total:  1h 04m | Avg:  1h 04m | Max:  1h 04m
      🟩 GCC11              Pass: 100%/1   | Total: 55m 17s | Avg: 55m 17s | Max: 55m 17s
      🟩 GCC12              Pass: 100%/3   | Total:  1h 48m | Avg: 36m 09s | Max:  1h 01m
      🟩 GCC13              Pass: 100%/8   | Total:  4h 46m | Avg: 35m 46s | Max:  1h 02m
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 09m | Hits: 112%/1560  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 16m | Hits: 108%/1560  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 08m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total: 12h 32m | Avg: 53m 46s | Max:  1h 02m
      🟩 GCC                Pass: 100%/18  | Total: 13h 20m | Avg: 44m 28s | Max:  1h 04m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 35m | Avg:  1h 08m | Max:  1h 16m | Hits: 110%/3120  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 08m
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 46m 32s | Avg: 23m 16s | Max: 26m 55s
      🟩 v100               Pass: 100%/36  | Total:  1d 07h | Avg: 53m 14s | Max:  1h 16m | Hits: 110%/3120  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total:  1d 06h | Avg: 58m 06s | Max:  1h 16m | Hits: 110%/3120  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 19m 58s | Avg: 19m 58s | Max: 19m 58s
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 12s | Avg: 16m 12s | Max: 16m 12s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 14m | Avg: 24m 47s | Max: 33m 03s
      🟩 TestGPU            Pass: 100%/2   | Total: 51m 34s | Avg: 25m 47s | Max: 28m 18s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 46m 32s | Avg: 23m 16s | Max: 26m 55s
      🟩 90a                Pass: 100%/1   | Total: 26m 52s | Avg: 26m 52s | Max: 26m 52s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total: 14h 01m | Avg:  1h 00m | Max:  1h 09m | Hits: 111%/2340  
      🟩 20                 Pass: 100%/24  | Total: 18h 41m | Avg: 46m 44s | Max:  1h 16m | Hits: 107%/780   
    
  • 🟩 thrust: Pass: 100%/37 | Total: 20h 30m | Avg: 33m 15s | Max: 1h 06m | Hits: 213%/9220

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 37m 49s | Avg: 18m 54s | Max: 24m 49s
    🟩 cpu
      🟩 amd64              Pass: 100%/35  | Total: 19h 32m | Avg: 33m 30s | Max:  1h 06m | Hits: 213%/9220  
      🟩 arm64              Pass: 100%/2   | Total: 58m 09s | Avg: 29m 04s | Max: 30m 56s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 04m | Avg: 36m 59s | Max:  1h 01m | Hits: 174%/1844  
      🟩 12.5               Pass: 100%/2   | Total:  1h 53m | Avg: 56m 59s | Max: 59m 33s
      🟩 12.6               Pass: 100%/30  | Total: 15h 31m | Avg: 31m 03s | Max:  1h 06m | Hits: 222%/7376  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 56m 09s | Avg: 28m 04s | Max: 29m 00s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 04m | Avg: 36m 59s | Max:  1h 01m | Hits: 174%/1844  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 53m | Avg: 56m 59s | Max: 59m 33s
      🟩 nvcc12.6           Pass: 100%/28  | Total: 14h 35m | Avg: 31m 16s | Max:  1h 06m | Hits: 222%/7376  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 56m 09s | Avg: 28m 04s | Max: 29m 00s
      🟩 nvcc               Pass: 100%/35  | Total: 19h 34m | Avg: 33m 33s | Max:  1h 06m | Hits: 213%/9220  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 07m | Avg: 31m 50s | Max: 33m 06s
      🟩 Clang15            Pass: 100%/1   | Total: 30m 52s | Avg: 30m 52s | Max: 30m 52s
      🟩 Clang16            Pass: 100%/1   | Total: 30m 47s | Avg: 30m 47s | Max: 30m 47s
      🟩 Clang17            Pass: 100%/1   | Total: 30m 15s | Avg: 30m 15s | Max: 30m 15s
      🟩 Clang18            Pass: 100%/7   | Total:  3h 05m | Avg: 26m 27s | Max: 33m 28s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 05m | Avg: 32m 37s | Max: 34m 07s
      🟩 GCC8               Pass: 100%/1   | Total: 31m 08s | Avg: 31m 08s | Max: 31m 08s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 04m | Avg: 32m 09s | Max: 34m 13s
      🟩 GCC10              Pass: 100%/1   | Total: 31m 29s | Avg: 31m 29s | Max: 31m 29s
      🟩 GCC11              Pass: 100%/1   | Total: 34m 36s | Avg: 34m 36s | Max: 34m 36s
      🟩 GCC12              Pass: 100%/1   | Total: 33m 17s | Avg: 33m 17s | Max: 33m 17s
      🟩 GCC13              Pass: 100%/8   | Total:  2h 52m | Avg: 21m 31s | Max: 32m 44s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 01m | Hits: 175%/3688  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 36m | Avg: 52m 18s | Max:  1h 06m | Hits: 238%/5532  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 53m | Avg: 56m 59s | Max: 59m 33s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total:  6h 44m | Avg: 28m 53s | Max: 33m 28s
      🟩 GCC                Pass: 100%/16  | Total:  7h 12m | Avg: 27m 00s | Max: 34m 36s
      🟩 MSVC               Pass: 100%/5   | Total:  4h 40m | Avg: 56m 01s | Max:  1h 06m | Hits: 213%/9220  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 53m | Avg: 56m 59s | Max: 59m 33s
    🟩 gpu
      🟩 v100               Pass: 100%/37  | Total: 20h 30m | Avg: 33m 15s | Max:  1h 06m | Hits: 213%/9220  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total: 18h 42m | Avg: 36m 11s | Max:  1h 06m | Hits: 175%/7376  
      🟩 TestCPU            Pass: 100%/3   | Total: 49m 37s | Avg: 16m 32s | Max: 34m 58s | Hits: 365%/1844  
      🟩 TestGPU            Pass: 100%/3   | Total: 59m 01s | Avg: 19m 40s | Max: 33m 28s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 18m 50s | Avg: 18m 50s | Max: 18m 50s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total:  9h 14m | Avg: 39m 37s | Max:  1h 01m | Hits: 175%/5532  
      🟩 20                 Pass: 100%/21  | Total: 10h 38m | Avg: 30m 23s | Max:  1h 06m | Hits: 269%/3688  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 25s | Avg: 4m 42s | Max: 7m 16s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  9m 25s | Avg:  4m 42s | Max:  7m 16s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  9m 25s | Avg:  4m 42s | Max:  7m 16s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 25s | Avg:  4m 42s | Max:  7m 16s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  9m 25s | Avg:  4m 42s | Max:  7m 16s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  9m 25s | Avg:  4m 42s | Max:  7m 16s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  9m 25s | Avg:  4m 42s | Max:  7m 16s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  9m 25s | Avg:  4m 42s | Max:  7m 16s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 09s | Avg:  2m 09s | Max:  2m 09s
      🟩 Test               Pass: 100%/1   | Total:  7m 16s | Avg:  7m 16s | Max:  7m 16s
    
  • 🟩 python: Pass: 100%/1 | Total: 29m 05s | Avg: 29m 05s | Max: 29m 05s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 29m 05s | Avg: 29m 05s | Max: 29m 05s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 29m 05s | Avg: 29m 05s | Max: 29m 05s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 29m 05s | Avg: 29m 05s | Max: 29m 05s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 29m 05s | Avg: 29m 05s | Max: 29m 05s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 29m 05s | Avg: 29m 05s | Max: 29m 05s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 29m 05s | Avg: 29m 05s | Max: 29m 05s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 29m 05s | Avg: 29m 05s | Max: 29m 05s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 29m 05s | Avg: 29m 05s | Max: 29m 05s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 78)

# Runner
53 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

@bernhardmgruber bernhardmgruber added cub For all items related to CUB backport branch/2.8.x labels Jan 15, 2025
@fbusato fbusato merged commit 43fb061 into NVIDIA:main Jan 15, 2025
94 checks passed
Copy link
Contributor

Git push to origin failed for branch/2.8.x with exitcode 128

bernhardmgruber pushed a commit to bernhardmgruber/cccl that referenced this pull request Jan 15, 2025
@bernhardmgruber bernhardmgruber linked an issue Jan 15, 2025 that may be closed by this pull request
@fbusato fbusato deleted the review-cub-util-ptx branch January 15, 2025 01:29
miscco pushed a commit that referenced this pull request Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport branch/2.8.x cub For all items related to CUB
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Review and deprecate features from CUB util_ptx.cuh
3 participants