Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node based DFT #475

Merged
merged 18 commits into from
Feb 21, 2024
Merged

Node based DFT #475

merged 18 commits into from
Feb 21, 2024

Conversation

gitpeterwind
Copy link
Member

@gitpeterwind gitpeterwind commented Nov 30, 2023

The loop over node is set as the outer loop in MRDFT. For large systems using MPI, that removes the memory intensive intermediate Functions (mostly derivatives) , and is also much faster as a by-product.
The code is also much simpler (only one extra method in Functional, instead of the 4 subclasses for each case).

Also the rotation of the sad initial_guess was very slow and a bottleneck. A new rotation is implemented and the time went down from 200 s to 3 s!

With all the "node_xc" changes in mrcpp and mrchem, the code is much more user friendly. It runs smoothly with 1000 orbitals on Betzy. No need to make special settings at the start to "save" memory. For even larger systems, the O(N^3) terms (diagonalization of Fock matrix, orthonormalization, localization) become a bottleneck and should be addressed (using ELPA for example).

Test valinomycine (300 orbitals) betzy, 4 nodes (can also run on 1 nodes now):
old:

                           Building XC operator
---------------------------------------------------------------------------
 Precision                                  (rel)              1.00000e-05
---------------------------------------------------------------------------
 Compute rho                     46152 nds         1.41 GB        5.37 sec
 Preprocess input               184672 nds         5.64 GB        0.79 sec
 Evaluate functional            230840 nds         7.04 GB       16.43 sec
 Postprocess potential           92336 nds         2.82 GB        1.72 sec
---------------------------------------------------------------------------
                         Wall time: 2.46008e+01 sec

Memory statistics, in GiB: 190.0

new:

                           Building XC operator
---------------------------------------------------------------------------
 Precision                                  (rel)              1.00000e-05
---------------------------------------------------------------------------
 Compute rho                     46152 nds         1.41 GB        5.39 sec
 Make potential                  46168 nds         1.41 GB        3.03 sec
---------------------------------------------------------------------------
                         Wall time: 8.79213e+00 sec

Memory statistics, in GiB: 64.9  (128.6 after only XC upgrade, 81.8 after GenNodes agressive cleaning, the rest because of Bank upgrade and large memory chunks)

@gitpeterwind gitpeterwind added the WIP Work in progress label Nov 30, 2023
@gitpeterwind gitpeterwind removed the WIP Work in progress label Dec 9, 2023
@gitpeterwind gitpeterwind requested a review from ilfreddy December 9, 2023 10:10
@gitpeterwind gitpeterwind added WIP Work in progress and removed WIP Work in progress labels Dec 12, 2023
@@ -39,7 +39,7 @@ else()
GIT_REPOSITORY
https://github.com/MRChemSoft/mrcpp.git
GIT_TAG
f8def0a086da6410e5dd8e078de4f6b6305b6ea3
83df62a6b2bd2dec8b94064089ebb8641704b2f8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be updated before approval

doc/users/user_inp.rst Outdated Show resolved Hide resolved
@@ -24,6 +24,7 @@
},
"mpi": { # Section for MPI specification
"bank_size": int, # Number of MPI ranks in memory bank
"omp_threads": int, # Number of omp threads
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the number of OpenMP threads has to appear in the input, to be honest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is to keep a way to force the number of threads, as it is set automatically otherwise. For testing performance, for example, one may want to use less threads than the maximum.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the OMP_NUM_THREADS environment variable isn't enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the MPI case the OMP_NUM_THREADS variable is not used. This is because it is often not set automatically by the system and even if set, it will not have the right value. Asking the user to set it, they will most probably not choose the optimal value. (the optimal value is larger than the number of cores divided by the number of MPI processes, because not all the MPI processes are threaded).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your question: in an earlier version I used OMP_NUM_THREADS, but then I realized that the only cases were this was useful is in the rare case you do not want to use all the cores. In the very majority of practical situations, the risk of taking a non-optimal value was large.
Good you put that remark, because I had forgotten to update the docs :)

Copy link

codecov bot commented Feb 15, 2024

Codecov Report

Attention: 11 lines in your changes are missing coverage. Please review.

Comparison is base (698e74f) 70.54% compared to head (59a0eb2) 68.66%.

Files Patch % Lines
src/mrdft/MRDFT.cpp 58.33% 10 Missing ⚠️
src/mrdft/Functional.cpp 98.96% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #475      +/-   ##
==========================================
- Coverage   70.54%   68.66%   -1.89%     
==========================================
  Files         195      194       -1     
  Lines       15446    15285     -161     
==========================================
- Hits        10896    10495     -401     
- Misses       4550     4790     +240     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

gitpeterwind and others added 2 commits February 20, 2024 16:23
@gitpeterwind gitpeterwind merged commit 1e7ea8e into MRChemSoft:master Feb 21, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants