Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a transport interface for switching between AdePT implementations #295

Merged
merged 3 commits into from
Nov 26, 2024

Conversation

hageboeck
Copy link
Contributor

@hageboeck hageboeck commented Jul 26, 2024

  • Add an interface for switching between different AdePT backends. In this way, the AsyncAdePT implementation can share scoring and integration code with example1.
    • Split the transport from the integration-related parts by creating a
      new library.
    • Create a transport abstraction, so AdePTTrackingManager is independent
      of the transport implementation.
    • Add thread and event IDs to the transport interface. These are
      necessary for the async transport implementation.
    • Start to enumerate tracks in the tracking manager. This can be used to
      reproducibly seed the AdePT random sequences.
    • Add some const declarations for the default AdePT implementation.
    • Use a factory funciton to instantiate AdePT. Like this, different
      AdePT implementations can be used without changing code in the tracking
      manager or in AdePTPhysics.
    • Replace a few includes with forward declarations.
    • Fix device link errors that can show when using a symbol in multiple
      cuda translation units.
  • Refactor the processing of hits.
    • Instead of processing hits by passing a pointer/reference to a HostScoring instance, a loop over iterators to hits is used. In this way, hit scoring is decoupled from the specific implementation of HostScoring, and all classes with the same interface as the original GPUHit can be used for scoring. This is done to facilitate hit scoring in the AsyncExample.
    • Move Geant4 objects into the .cpp to make the integration headers simpler.
    • Place temporary scoring objects into a struct to go around G4's pool allocators. This prevents a destruction order fiasco (where the pool is gone but the object isn't), and keeps the scoring objects closer in memory. Two objects need to leak, unfortunately, since they are allocated in G4 pools, and the handles don't support them being on a stack.
    • Improve const correctness in a few places.
    • Add information about threadID and eventID to the scoring template. This information is required for AsyncAdePT to score correctly, but is unused in example1 for now.

These two changes don't seem to impact the run times. Here is a diff of the full sorted output of example 1 with 8 ttbar events on 4 threads (including diffing the energy depositions):

@@ -5104,7 +5092,7 @@
               proton:         7 eKin (GeV):         1720.36 (total)         245.765 (avg)
 Reading cms2018_sd.gdml transiently on CPU for Geant4 ...
                       References : NIM A 506 (2003), 250-303
-Run time: 159.599
+Run time: 160.097
               sigma+:         1 eKin (GeV):         12.1624 (total)         12.1624 (avg)
               sigma-:         1 eKin (GeV):          21.225 (total)          21.225 (avg)
               sigma-:         2 eKin (GeV):          28.236 (total)          14.118 (avg)

@hageboeck hageboeck self-assigned this Jul 26, 2024
@phsft-bot
Copy link

Can one of the admins verify this patch?

@hageboeck hageboeck force-pushed the AdePTInterface branch 3 times, most recently from ca1421d to 4c08ef7 Compare August 23, 2024 12:44
@JuanGonzalezCaminero
Copy link
Contributor

After the latest changes, I still get errors when linking libvecgeomcuda_static

@hageboeck
Copy link
Contributor Author

After the latest changes, I still get errors when linking libvecgeomcuda_static

I just did a rebase. The link error for the factory function is hopefully gone, but vecgeom linking has to be tested on Ubuntu. It works on alma9, but that doesn't say much for Ubuntu.

@SeverinDiederichs
Copy link
Collaborator

Thanks for the update! Unfortunately, linking still fails for me on Ubuntu

/usr/bin/ld: /home/ndiederi/software/vecgeom_install/lib/libvecgeomcuda_static.a(CudaManager.cu.o): in function `__sti____cudaRegisterAll()':
tmpxft_000c241d_00000000-6_CudaManager.cudafe1.cpp:(.text+0x103c): undefined reference to `__cudaRegisterLinkedBinary_5b460372_14_CudaManager_cu_729d03f6_795694'
/usr/bin/ld: /home/ndiederi/software/vecgeom_install/lib/libvecgeomcuda_static.a(UnplacedPolycone.cu.o): in function `__sti____cudaRegisterAll()':
tmpxft_000c2441_00000000-6_UnplacedPolycone.cudafe1.cpp:(.text+0x2119): undefined reference to `__cudaRegisterLinkedBinary_e878b2bb_19_UnplacedPolycone_cu_bc11087f_795729'
/usr/bin/ld: /home/ndiederi/software/vecgeom_install/lib/libvecgeomcuda_static.a(UnplacedPolyhedron.cu.o): in function `__sti____cudaRegisterAll()':
tmpxft_000c247d_00000000-6_UnplacedPolyhedron.cudafe1.cpp:(.text+0x27b4): undefined reference to `__cudaRegisterLinkedBinary_531e658e_21_UnplacedPolyhedron_cu_bc11087f_795802'

From our last conversation, maybe we should remove the dynamically linked vecgeom?

@hageboeck
Copy link
Contributor Author

Thanks for the update! Unfortunately, linking still fails for me on Ubuntu

/usr/bin/ld: /home/ndiederi/software/vecgeom_install/lib/libvecgeomcuda_static.a(CudaManager.cu.o): in function `__sti____cudaRegisterAll()':
tmpxft_000c241d_00000000-6_CudaManager.cudafe1.cpp:(.text+0x103c): undefined reference to `__cudaRegisterLinkedBinary_5b460372_14_CudaManager_cu_729d03f6_795694'
/usr/bin/ld: /home/ndiederi/software/vecgeom_install/lib/libvecgeomcuda_static.a(UnplacedPolycone.cu.o): in function `__sti____cudaRegisterAll()':
tmpxft_000c2441_00000000-6_UnplacedPolycone.cudafe1.cpp:(.text+0x2119): undefined reference to `__cudaRegisterLinkedBinary_e878b2bb_19_UnplacedPolycone_cu_bc11087f_795729'
/usr/bin/ld: /home/ndiederi/software/vecgeom_install/lib/libvecgeomcuda_static.a(UnplacedPolyhedron.cu.o): in function `__sti____cudaRegisterAll()':
tmpxft_000c247d_00000000-6_UnplacedPolyhedron.cudafe1.cpp:(.text+0x27b4): undefined reference to `__cudaRegisterLinkedBinary_531e658e_21_UnplacedPolyhedron_cu_bc11087f_795802'

From our last conversation, maybe we should remove the dynamically linked vecgeom?

Thanks for helping to debug this. In the end, the solution was to link against the dynamic vecgeomcuda, because the static one doesn't contain marks them as undefined.

@SeverinDiederichs
Copy link
Collaborator

I can confirm that the latest pushed version works flawlessly on Ubuntu

@agheata
Copy link
Contributor

agheata commented Oct 14, 2024

The CI builds correctly but fails on example1, we need to understand why:
CUDA ERROR at /mnt/build/sftnight/workspace/AdePT-CI/AdePT/include/AdePT/core/AdePTTransport.cuh[340] : an illegal memory access was encountered

@hageboeck hageboeck force-pushed the AdePTInterface branch 3 times, most recently from e7ef590 to 2cf3ac4 Compare October 15, 2024 15:16
@JuanGonzalezCaminero
Copy link
Contributor

All the cuda linking errors are gone for me in the latest version, however I still get the following:

/usr/bin/ld: ../../BuildProducts/lib/libAdePT_G4_integration.so: undefined reference to `AdePTTransportFactory(unsigned int, unsigned int, unsigned int, int, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const*, bool)'

- Refactor the processing of hits.
  Instead of processing hits by passing a pointer/reference to a
  HostScoring instance, a loop over iterators to hits is used.
  In this way, hit scoring is decoupled from the specific implementation
  of HostScoring, and all classes with the same interface as the original
  GPUHit can be used for scoring.
  This facilitates hit scoring for the AsyncTransport implementation.
- Move several Geant4 objects into the .cpp to make the integration headers
  simpler.
- Place temporary scoring objects into a struct to go around G4's pool
  allocators. This prevents a destruction order fiasco (where the pool
  is gone but the object isn't), and keeps the scoring objects closer
  in memory.
  A few objects need to leak, unfortunately, since they are allocated in
  G4 pools, and the handles don't support them being on a stack.
- Improve const correctness in a few places.
- Add information about threadID and eventID to the scoring interface.
  This information is required for AsyncAdePT to score correctly, but is
  unused in the thread-local transport for now.
- Split the transport from the integration-related parts introducing a
  new source file. The integration-related parts can be reused in a
  different transport implementation.
- Create a transport abstraction, so AdePTTrackingManager is independent
  of the transport implementation.
- Add thread and event IDs to the transport interface. These are
  necessary for the async transport implementation.
- Start to enumerate tracks in the tracking manager. This can be used to
  reproducibly seed the AdePT random sequences.
- Add some const declarations for the default AdePT implementation.
- Use a factory function to instantiate AdePT. Like this, different
  AdePT implementations can be used without changing code in the tracking
  manager or in AdePTPhysics.
- Replace a few includes with forward declarations.
- Fix device link errors that can show when using a symbol in multiple
  cuda translation units.
@agheata agheata merged commit c385a73 into apt-sim:master Nov 26, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants