You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why this feature is necessary:
Resolving this would enable us to use larger batches and possibly train with data that hasn't been moved to locally mounted SSDs.
A possible solution is:
The maximum time to get N batches should be when max_workers=1. Adding more workers should enable us to parallelize this but right now there appears to be some blocking going on.
I have considered the following alternatives:
The issue can likely be traced to the implementation in
I have experimented with a few different ways to use workers in this method and have not seen significant improvement. Attempts included using a ThreadPoolExecutor combined with an as_completed loop over futures and also without an as_completed loop while queueing futures themselves.
Additional context
Reproduce this with the following:
I have profiled both of these code blocks with cProfile and see some strange differences in the timing and number of calls for the sample_batch function but don't know what to make of those differences. Without moving data to local SSD: With max_workers=1 I see a per call time of ~40 seconds and 16 calls With max_workers=10 I see a call time of ~400 seconds and 2 calls. Calls to get_batch go from ~20 seconds to ~60 seconds.
profile for max_workers=1:
profile for max_workers=10:
Urgency / Timeframe
Not urgent. max_workers=1 works well currently with training data on local SSD.
The text was updated successfully, but these errors were encountered:
Why this feature is necessary:
Resolving this would enable us to use larger batches and possibly train with data that hasn't been moved to locally mounted SSDs.
A possible solution is:
The maximum time to get N batches should be when max_workers=1. Adding more workers should enable us to parallelize this but right now there appears to be some blocking going on.
I have considered the following alternatives:
The issue can likely be traced to the implementation in
sup3r/sup3r/preprocessing/batch_queues/abstract.py
Line 234 in 34760ba
I have experimented with a few different ways to use workers in this method and have not seen significant improvement. Attempts included using a
ThreadPoolExecutor
combined with anas_completed
loop over futures and also without anas_completed
loop while queueing futures themselves.Additional context
Reproduce this with the following:
I have profiled both of these code blocks with
cProfile
and see some strange differences in the timing and number of calls for thesample_batch
function but don't know what to make of those differences. Without moving data to local SSD: With max_workers=1 I see a per call time of ~40 seconds and 16 calls With max_workers=10 I see a call time of ~400 seconds and 2 calls. Calls toget_batch
go from ~20 seconds to ~60 seconds.profile for max_workers=1:
profile for max_workers=10:
Urgency / Timeframe
Not urgent. max_workers=1 works well currently with training data on local SSD.
The text was updated successfully, but these errors were encountered: