Multi-GPU Training for Large Datasets #188
Replies: 2 comments 5 replies
-
10000 NEB shouldn't pose a problem. We have trained potentials with 180000 data points with no issues. There are already example files showing how to do multi-gpu with pytorch-lightning with matgl. Pls review those. Yes, creating the graphs first before actually running the training would be a good idea. Though the latest version of matgl should already not store the structures after conversion. |
Beta Was this translation helpful? Give feedback.
-
Hi!
but I can't find any on the repository, as @mstapelberg also said above. Do you possibly have these examples locally / on a private branch, that could be shared? |
Beta Was this translation helpful? Give feedback.
-
Hi there,
I am looking to train a potential on around 10,000 NEB VASP simulations and I've noticed that running on a single gpu simply doesn't cut it.
I'm fairly new to DGL and Torch, so was wondering if anyone had suggestions on how to setup multi-gpu training with matgl?
My current idea is to just use the Matgl functions to create the graphs (to save memory as well) and then use DDP in torch to spread the dataset over 4 gpus following this example : https://huggingface.co/blog/pytorch-ddp-accelerate-transformers
Is there an existing example that does this? If not, I'm happy to try myself and post a working example
Thanks,
Myles
Beta Was this translation helpful? Give feedback.
All reactions