Node based DFT #475

gitpeterwind · 2023-11-30T15:43:49Z

The loop over node is set as the outer loop in MRDFT. For large systems using MPI, that removes the memory intensive intermediate Functions (mostly derivatives) , and is also much faster as a by-product.
The code is also much simpler (only one extra method in Functional, instead of the 4 subclasses for each case).

Also the rotation of the sad initial_guess was very slow and a bottleneck. A new rotation is implemented and the time went down from 200 s to 3 s!

With all the "node_xc" changes in mrcpp and mrchem, the code is much more user friendly. It runs smoothly with 1000 orbitals on Betzy. No need to make special settings at the start to "save" memory. For even larger systems, the O(N^3) terms (diagonalization of Fock matrix, orthonormalization, localization) become a bottleneck and should be addressed (using ELPA for example).

Test valinomycine (300 orbitals) betzy, 4 nodes (can also run on 1 nodes now):
old:

                           Building XC operator
---------------------------------------------------------------------------
 Precision                                  (rel)              1.00000e-05
---------------------------------------------------------------------------
 Compute rho                     46152 nds         1.41 GB        5.37 sec
 Preprocess input               184672 nds         5.64 GB        0.79 sec
 Evaluate functional            230840 nds         7.04 GB       16.43 sec
 Postprocess potential           92336 nds         2.82 GB        1.72 sec
---------------------------------------------------------------------------
                         Wall time: 2.46008e+01 sec

Memory statistics, in GiB: 190.0

new:

                           Building XC operator
---------------------------------------------------------------------------
 Precision                                  (rel)              1.00000e-05
---------------------------------------------------------------------------
 Compute rho                     46152 nds         1.41 GB        5.39 sec
 Make potential                  46168 nds         1.41 GB        3.03 sec
---------------------------------------------------------------------------
                         Wall time: 8.79213e+00 sec

Memory statistics, in GiB: 64.9  (128.6 after only XC upgrade, 81.8 after GenNodes agressive cleaning, the rest because of Bank upgrade and large memory chunks)

ilfreddy · 2024-01-30T13:20:33Z

external/upstream/fetch_mrcpp.cmake

@@ -39,7 +39,7 @@ else()
    GIT_REPOSITORY
      https://github.com/MRChemSoft/mrcpp.git
    GIT_TAG
-      f8def0a086da6410e5dd8e078de4f6b6305b6ea3
+      83df62a6b2bd2dec8b94064089ebb8641704b2f8


This must be updated before approval

doc/users/user_inp.rst

robertodr · 2024-02-14T12:30:33Z

doc/users/schema_input.json

@@ -24,6 +24,7 @@
  },
  "mpi": {                                   # Section for MPI specification
    "bank_size": int,                        # Number of MPI ranks in memory bank
+    "omp_threads": int,                      # Number of omp threads


I don't understand why the number of OpenMP threads has to appear in the input, to be honest.

It is to keep a way to force the number of threads, as it is set automatically otherwise. For testing performance, for example, one may want to use less threads than the maximum.

And the OMP_NUM_THREADS environment variable isn't enough?

For the MPI case the OMP_NUM_THREADS variable is not used. This is because it is often not set automatically by the system and even if set, it will not have the right value. Asking the user to set it, they will most probably not choose the optimal value. (the optimal value is larger than the number of cores divided by the number of MPI processes, because not all the MPI processes are threaded).

I understand your question: in an earlier version I used OMP_NUM_THREADS, but then I realized that the only cases were this was useful is in the rare case you do not want to use all the cores. In the very majority of practical situations, the risk of taking a non-optimal value was large.
Good you put that remark, because I had forgotten to update the docs :)

codecov · 2024-02-15T14:32:03Z

Codecov Report

Attention: 11 lines in your changes are missing coverage. Please review.

Comparison is base (698e74f) 70.54% compared to head (59a0eb2) 68.66%.

Files	Patch %	Lines
src/mrdft/MRDFT.cpp	58.33%	10 Missing ⚠️
src/mrdft/Functional.cpp	98.96%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #475      +/-   ##
==========================================
- Coverage   70.54%   68.66%   -1.89%     
==========================================
  Files         195      194       -1     
  Lines       15446    15285     -161     
==========================================
- Hits        10896    10495     -401     
- Misses       4550     4790     +240

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…reads and override the default value

Updated mrcpp hashtag

Co-authored-by: Roberto Di Remigio Eikås <[email protected]>

gitpeterwind added the WIP Work in progress label Nov 30, 2023

gitpeterwind removed the WIP Work in progress label Dec 9, 2023

gitpeterwind requested a review from ilfreddy December 9, 2023 10:10

gitpeterwind added WIP Work in progress and removed WIP Work in progress labels Dec 12, 2023

ilfreddy reviewed Jan 30, 2024

View reviewed changes

ilfreddy approved these changes Jan 30, 2024

View reviewed changes

robertodr reviewed Feb 14, 2024

View reviewed changes

doc/users/user_inp.rst Outdated Show resolved Hide resolved

robertodr reviewed Feb 14, 2024

View reviewed changes

gitpeterwind and others added 16 commits February 20, 2024 14:44

XC by nodes

fe6c0a3

node based DFT implemented and tested for second order GGA

69c6e22

mrcpp tag

e490f52

omp_threads as keyword in the MPI section to set the number of omp th…

ee2c158

…reads and override the default value

clean ups, comments

4dc0679

superfast rotation in sad initial guess

680be75

use overlap from mrcpp, RRmaximizer slightly optimized

62a36e1

update mrcpp git tag

f2c5491

dummy

ce5db70

Update fetch_mrcpp.cmake

045a88c

Updated mrcpp hashtag

fix tests

50ce84d

update mrcpp gittag

a38aa24

update test and mrcpp gittag

4f0ebe8

update test li_scf_pbe0 json file and gittag

ab9da36

doc OMP_NUM_THREADS

6a764f2

merge user_ref.rst

2778c2d

gitpeterwind force-pushed the node_xc branch from 4c51a22 to 2778c2d Compare February 20, 2024 14:54

gitpeterwind and others added 2 commits February 20, 2024 16:23

git rm unwanted files

f14ff17

Update doc/users/user_inp.rst

59a0eb2

Co-authored-by: Roberto Di Remigio Eikås <[email protected]>

ilfreddy approved these changes Feb 20, 2024

View reviewed changes

gitpeterwind merged commit 1e7ea8e into MRChemSoft:master Feb 21, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node based DFT #475

Node based DFT #475

gitpeterwind commented Nov 30, 2023 •

edited

Loading

ilfreddy Jan 30, 2024

robertodr Feb 14, 2024

gitpeterwind Feb 14, 2024

robertodr Feb 14, 2024

gitpeterwind Feb 14, 2024

gitpeterwind Feb 15, 2024

codecov bot commented Feb 15, 2024 •

edited

Loading

Node based DFT #475

Node based DFT #475

Conversation

gitpeterwind commented Nov 30, 2023 • edited Loading

ilfreddy Jan 30, 2024

Choose a reason for hiding this comment

robertodr Feb 14, 2024

Choose a reason for hiding this comment

gitpeterwind Feb 14, 2024

Choose a reason for hiding this comment

robertodr Feb 14, 2024

Choose a reason for hiding this comment

gitpeterwind Feb 14, 2024

Choose a reason for hiding this comment

gitpeterwind Feb 15, 2024

Choose a reason for hiding this comment

codecov bot commented Feb 15, 2024 • edited Loading

Codecov Report

gitpeterwind commented Nov 30, 2023 •

edited

Loading

codecov bot commented Feb 15, 2024 •

edited

Loading