Welcome to discuss the training methods for big datasets #159
SmallBearC
started this conversation in
General
Replies: 1 comment 2 replies
-
One way to save some memory is to do a one-shot conversion of all structures to graphs first. Then reload the graphs only and run the training. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am working with an extensive dataset comprising a total of 800,000 structures, which exceeds the memory capacity of my machine, making it impossible to load all the data at once. Currently, my approach involves a two-step process.
In the initial phase, I aim to train the model with as much data as my system can handle, subsequently saving the model's progress. In the second phase, I load the saved model and employ a fresh subset of the dataset for further training. During this phase, I also employ a reduced learning rate of 1e-05. This process is then iterated as necessary to ensure comprehensive training.
I'd like to hear from anyone who's dealt with massive datasets like mine. If you've tackled similar challenges, please feel free to share your tips, insights, and experiences!
Beta Was this translation helpful? Give feedback.
All reactions