-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node based DFT #475
Node based DFT #475
Conversation
external/upstream/fetch_mrcpp.cmake
Outdated
@@ -39,7 +39,7 @@ else() | |||
GIT_REPOSITORY | |||
https://github.com/MRChemSoft/mrcpp.git | |||
GIT_TAG | |||
f8def0a086da6410e5dd8e078de4f6b6305b6ea3 | |||
83df62a6b2bd2dec8b94064089ebb8641704b2f8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must be updated before approval
@@ -24,6 +24,7 @@ | |||
}, | |||
"mpi": { # Section for MPI specification | |||
"bank_size": int, # Number of MPI ranks in memory bank | |||
"omp_threads": int, # Number of omp threads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why the number of OpenMP threads has to appear in the input, to be honest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is to keep a way to force the number of threads, as it is set automatically otherwise. For testing performance, for example, one may want to use less threads than the maximum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the OMP_NUM_THREADS environment variable isn't enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the MPI case the OMP_NUM_THREADS variable is not used. This is because it is often not set automatically by the system and even if set, it will not have the right value. Asking the user to set it, they will most probably not choose the optimal value. (the optimal value is larger than the number of cores divided by the number of MPI processes, because not all the MPI processes are threaded).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your question: in an earlier version I used OMP_NUM_THREADS, but then I realized that the only cases were this was useful is in the rare case you do not want to use all the cores. In the very majority of practical situations, the risk of taking a non-optimal value was large.
Good you put that remark, because I had forgotten to update the docs :)
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #475 +/- ##
==========================================
- Coverage 70.54% 68.66% -1.89%
==========================================
Files 195 194 -1
Lines 15446 15285 -161
==========================================
- Hits 10896 10495 -401
- Misses 4550 4790 +240 ☔ View full report in Codecov by Sentry. |
…reads and override the default value
Updated mrcpp hashtag
4c51a22
to
2778c2d
Compare
Co-authored-by: Roberto Di Remigio Eikås <[email protected]>
The loop over node is set as the outer loop in MRDFT. For large systems using MPI, that removes the memory intensive intermediate Functions (mostly derivatives) , and is also much faster as a by-product.
The code is also much simpler (only one extra method in Functional, instead of the 4 subclasses for each case).
Also the rotation of the sad initial_guess was very slow and a bottleneck. A new rotation is implemented and the time went down from 200 s to 3 s!
With all the "node_xc" changes in mrcpp and mrchem, the code is much more user friendly. It runs smoothly with 1000 orbitals on Betzy. No need to make special settings at the start to "save" memory. For even larger systems, the O(N^3) terms (diagonalization of Fock matrix, orthonormalization, localization) become a bottleneck and should be addressed (using ELPA for example).
Test valinomycine (300 orbitals) betzy, 4 nodes (can also run on 1 nodes now):
old:
new: