Create a transport interface for switching between AdePT implementations #295

hageboeck · 2024-07-26T12:56:08Z

Add an interface for switching between different AdePT backends. In this way, the AsyncAdePT implementation can share scoring and integration code with example1.
- Split the transport from the integration-related parts by creating a
  new library.
- Create a transport abstraction, so AdePTTrackingManager is independent
  of the transport implementation.
- Add thread and event IDs to the transport interface. These are
  necessary for the async transport implementation.
- Start to enumerate tracks in the tracking manager. This can be used to
  reproducibly seed the AdePT random sequences.
- Add some const declarations for the default AdePT implementation.
- Use a factory funciton to instantiate AdePT. Like this, different
  AdePT implementations can be used without changing code in the tracking
  manager or in AdePTPhysics.
- Replace a few includes with forward declarations.
- Fix device link errors that can show when using a symbol in multiple
  cuda translation units.
Refactor the processing of hits.
- Instead of processing hits by passing a pointer/reference to a HostScoring instance, a loop over iterators to hits is used. In this way, hit scoring is decoupled from the specific implementation of HostScoring, and all classes with the same interface as the original GPUHit can be used for scoring. This is done to facilitate hit scoring in the AsyncExample.
- Move Geant4 objects into the .cpp to make the integration headers simpler.
- Place temporary scoring objects into a struct to go around G4's pool allocators. This prevents a destruction order fiasco (where the pool is gone but the object isn't), and keeps the scoring objects closer in memory. Two objects need to leak, unfortunately, since they are allocated in G4 pools, and the handles don't support them being on a stack.
- Improve const correctness in a few places.
- Add information about threadID and eventID to the scoring template. This information is required for AsyncAdePT to score correctly, but is unused in example1 for now.

These two changes don't seem to impact the run times. Here is a diff of the full sorted output of example 1 with 8 ttbar events on 4 threads (including diffing the energy depositions):

@@ -5104,7 +5092,7 @@
               proton:         7 eKin (GeV):         1720.36 (total)         245.765 (avg)
 Reading cms2018_sd.gdml transiently on CPU for Geant4 ...
                       References : NIM A 506 (2003), 250-303
-Run time: 159.599
+Run time: 160.097
               sigma+:         1 eKin (GeV):         12.1624 (total)         12.1624 (avg)
               sigma-:         1 eKin (GeV):          21.225 (total)          21.225 (avg)
               sigma-:         2 eKin (GeV):          28.236 (total)          14.118 (avg)

phsft-bot · 2024-07-26T13:01:02Z

Can one of the admins verify this patch?

JuanGonzalezCaminero · 2024-10-03T09:03:55Z

After the latest changes, I still get errors when linking libvecgeomcuda_static

hageboeck · 2024-10-03T15:33:03Z

After the latest changes, I still get errors when linking libvecgeomcuda_static

I just did a rebase. The link error for the factory function is hopefully gone, but vecgeom linking has to be tested on Ubuntu. It works on alma9, but that doesn't say much for Ubuntu.

SeverinDiederichs · 2024-10-04T06:27:53Z

Thanks for the update! Unfortunately, linking still fails for me on Ubuntu

/usr/bin/ld: /home/ndiederi/software/vecgeom_install/lib/libvecgeomcuda_static.a(CudaManager.cu.o): in function `__sti____cudaRegisterAll()':
tmpxft_000c241d_00000000-6_CudaManager.cudafe1.cpp:(.text+0x103c): undefined reference to `__cudaRegisterLinkedBinary_5b460372_14_CudaManager_cu_729d03f6_795694'
/usr/bin/ld: /home/ndiederi/software/vecgeom_install/lib/libvecgeomcuda_static.a(UnplacedPolycone.cu.o): in function `__sti____cudaRegisterAll()':
tmpxft_000c2441_00000000-6_UnplacedPolycone.cudafe1.cpp:(.text+0x2119): undefined reference to `__cudaRegisterLinkedBinary_e878b2bb_19_UnplacedPolycone_cu_bc11087f_795729'
/usr/bin/ld: /home/ndiederi/software/vecgeom_install/lib/libvecgeomcuda_static.a(UnplacedPolyhedron.cu.o): in function `__sti____cudaRegisterAll()':
tmpxft_000c247d_00000000-6_UnplacedPolyhedron.cudafe1.cpp:(.text+0x27b4): undefined reference to `__cudaRegisterLinkedBinary_531e658e_21_UnplacedPolyhedron_cu_bc11087f_795802'

From our last conversation, maybe we should remove the dynamically linked vecgeom?

hageboeck · 2024-10-14T14:37:24Z

Thanks for the update! Unfortunately, linking still fails for me on Ubuntu

/usr/bin/ld: /home/ndiederi/software/vecgeom_install/lib/libvecgeomcuda_static.a(CudaManager.cu.o): in function `__sti____cudaRegisterAll()':
tmpxft_000c241d_00000000-6_CudaManager.cudafe1.cpp:(.text+0x103c): undefined reference to `__cudaRegisterLinkedBinary_5b460372_14_CudaManager_cu_729d03f6_795694'
/usr/bin/ld: /home/ndiederi/software/vecgeom_install/lib/libvecgeomcuda_static.a(UnplacedPolycone.cu.o): in function `__sti____cudaRegisterAll()':
tmpxft_000c2441_00000000-6_UnplacedPolycone.cudafe1.cpp:(.text+0x2119): undefined reference to `__cudaRegisterLinkedBinary_e878b2bb_19_UnplacedPolycone_cu_bc11087f_795729'
/usr/bin/ld: /home/ndiederi/software/vecgeom_install/lib/libvecgeomcuda_static.a(UnplacedPolyhedron.cu.o): in function `__sti____cudaRegisterAll()':
tmpxft_000c247d_00000000-6_UnplacedPolyhedron.cudafe1.cpp:(.text+0x27b4): undefined reference to `__cudaRegisterLinkedBinary_531e658e_21_UnplacedPolyhedron_cu_bc11087f_795802'

From our last conversation, maybe we should remove the dynamically linked vecgeom?

Thanks for helping to debug this. In the end, the solution was to link against the dynamic vecgeomcuda, because the static one doesn't contain marks them as undefined.

SeverinDiederichs · 2024-10-14T14:45:19Z

I can confirm that the latest pushed version works flawlessly on Ubuntu

agheata · 2024-10-14T15:05:46Z

The CI builds correctly but fails on example1, we need to understand why:
CUDA ERROR at /mnt/build/sftnight/workspace/AdePT-CI/AdePT/include/AdePT/core/AdePTTransport.cuh[340] : an illegal memory access was encountered

JuanGonzalezCaminero · 2024-10-22T12:17:58Z

All the cuda linking errors are gone for me in the latest version, however I still get the following:

/usr/bin/ld: ../../BuildProducts/lib/libAdePT_G4_integration.so: undefined reference to `AdePTTransportFactory(unsigned int, unsigned int, unsigned int, int, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const*, bool)'

- Refactor the processing of hits. Instead of processing hits by passing a pointer/reference to a HostScoring instance, a loop over iterators to hits is used. In this way, hit scoring is decoupled from the specific implementation of HostScoring, and all classes with the same interface as the original GPUHit can be used for scoring. This facilitates hit scoring for the AsyncTransport implementation. - Move several Geant4 objects into the .cpp to make the integration headers simpler. - Place temporary scoring objects into a struct to go around G4's pool allocators. This prevents a destruction order fiasco (where the pool is gone but the object isn't), and keeps the scoring objects closer in memory. A few objects need to leak, unfortunately, since they are allocated in G4 pools, and the handles don't support them being on a stack. - Improve const correctness in a few places. - Add information about threadID and eventID to the scoring interface. This information is required for AsyncAdePT to score correctly, but is unused in the thread-local transport for now.

- Split the transport from the integration-related parts introducing a new source file. The integration-related parts can be reused in a different transport implementation. - Create a transport abstraction, so AdePTTrackingManager is independent of the transport implementation. - Add thread and event IDs to the transport interface. These are necessary for the async transport implementation. - Start to enumerate tracks in the tracking manager. This can be used to reproducibly seed the AdePT random sequences. - Add some const declarations for the default AdePT implementation. - Use a factory function to instantiate AdePT. Like this, different AdePT implementations can be used without changing code in the tracking manager or in AdePTPhysics. - Replace a few includes with forward declarations. - Fix device link errors that can show when using a symbol in multiple cuda translation units.

hageboeck self-assigned this Jul 26, 2024

hageboeck requested review from agheata and JuanGonzalezCaminero July 26, 2024 12:56

hageboeck force-pushed the AdePTInterface branch 3 times, most recently from ca1421d to 4c08ef7 Compare August 23, 2024 12:44

hageboeck force-pushed the AdePTInterface branch from 4c08ef7 to b9bc15f Compare August 23, 2024 13:33

hageboeck force-pushed the AdePTInterface branch from b9bc15f to 9a94b6f Compare October 2, 2024 15:30

hageboeck force-pushed the AdePTInterface branch from 9a94b6f to 40d7123 Compare October 3, 2024 15:29

hageboeck force-pushed the AdePTInterface branch from 40d7123 to 568635d Compare October 14, 2024 14:30

hageboeck force-pushed the AdePTInterface branch from 568635d to cd1e0c1 Compare October 14, 2024 14:51

hageboeck force-pushed the AdePTInterface branch 3 times, most recently from e7ef590 to 2cf3ac4 Compare October 15, 2024 15:16

hageboeck force-pushed the AdePTInterface branch from 2cf3ac4 to 31035e6 Compare October 31, 2024 15:41

hageboeck force-pushed the AdePTInterface branch from b8091ff to 85f9595 Compare November 14, 2024 09:53

hageboeck added 3 commits November 14, 2024 10:55

[benchmarking] Add a missing include.

8a6a82f

hageboeck force-pushed the AdePTInterface branch from 85f9595 to 8a6a82f Compare November 14, 2024 09:56

agheata approved these changes Nov 26, 2024

View reviewed changes

agheata merged commit c385a73 into apt-sim:master Nov 26, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a transport interface for switching between AdePT implementations #295

Create a transport interface for switching between AdePT implementations #295

hageboeck commented Jul 26, 2024 •

edited

Loading

phsft-bot commented Jul 26, 2024

JuanGonzalezCaminero commented Oct 3, 2024

hageboeck commented Oct 3, 2024

SeverinDiederichs commented Oct 4, 2024

hageboeck commented Oct 14, 2024

SeverinDiederichs commented Oct 14, 2024

agheata commented Oct 14, 2024

JuanGonzalezCaminero commented Oct 22, 2024

Create a transport interface for switching between AdePT implementations #295

Create a transport interface for switching between AdePT implementations #295

Conversation

hageboeck commented Jul 26, 2024 • edited Loading

phsft-bot commented Jul 26, 2024

JuanGonzalezCaminero commented Oct 3, 2024

hageboeck commented Oct 3, 2024

SeverinDiederichs commented Oct 4, 2024

hageboeck commented Oct 14, 2024

SeverinDiederichs commented Oct 14, 2024

agheata commented Oct 14, 2024

JuanGonzalezCaminero commented Oct 22, 2024

hageboeck commented Jul 26, 2024 •

edited

Loading