Welcome to discuss the training methods for big datasets #159

SmallBearC · 2023-09-03T13:23:29Z

SmallBearC
Sep 3, 2023

I am working with an extensive dataset comprising a total of 800,000 structures, which exceeds the memory capacity of my machine, making it impossible to load all the data at once. Currently, my approach involves a two-step process.

In the initial phase, I aim to train the model with as much data as my system can handle, subsequently saving the model's progress. In the second phase, I load the saved model and employ a fresh subset of the dataset for further training. During this phase, I also employ a reduced learning rate of 1e-05. This process is then iterated as necessary to ensure comprehensive training.

I'd like to hear from anyone who's dealt with massive datasets like mine. If you've tackled similar challenges, please feel free to share your tips, insights, and experiences!

shyuep · 2023-09-03T13:43:45Z

shyuep
Sep 3, 2023
Maintainer

One way to save some memory is to do a one-shot conversion of all structures to graphs first. Then reload the graphs only and run the training.

2 replies

SmallBearC Sep 4, 2023
Author

Thank you for your prompt response. I greatly appreciate the solution you've provided; it's both clear and effective. I wanted to share that I've successfully trained a surface system using m3GNET and the results have been commendable with just a few training iterations. Currently, I'm venturing into a more intricate system. My intention is to build upon the model your team has already trained, as I believe this could enhance the efficiency of my training process. Would you have any recommendations for this approach? Specifically, should I consider adjusting the learning rate or perhaps freezing certain layers of the network?

Thank you in advance for your insights.

Best regards

shyuep Sep 5, 2023
Maintainer

We always welcome architectural improvements on the model. Of course playing around with the optimization parameters and starting from a pretrained model often helps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Welcome to discuss the training methods for big datasets #159

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Welcome to discuss the training methods for big datasets #159

SmallBearC Sep 3, 2023

Replies: 1 comment · 2 replies

shyuep Sep 3, 2023 Maintainer

SmallBearC Sep 4, 2023 Author

shyuep Sep 5, 2023 Maintainer

SmallBearC
Sep 3, 2023

Replies: 1 comment 2 replies

shyuep
Sep 3, 2023
Maintainer

SmallBearC Sep 4, 2023
Author

shyuep Sep 5, 2023
Maintainer