You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey @abhijithneilabraham thanks for this issue! How would you propose finding the dataset size ahead of time? MDSWriter currently has no knowledge of how large your raw dataset files are or how it is being used to iterate over your original dataset...
🚀 Feature Request
Number of shards that would be created, estimated with help of size_limit and data size can be a useful metric.
Motivation
If in future, other features such as resume data conversion etc are implemented , it could be built with the help of this feature.
[Optional] Implementation
Additional context
The text was updated successfully, but these errors were encountered: