Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getting batches with BatchHandler is slower with max_workers>1 than max_workers=1 #241

Open
bnb32 opened this issue Nov 6, 2024 · 0 comments
Labels

Comments

@bnb32
Copy link
Collaborator

bnb32 commented Nov 6, 2024

Why this feature is necessary:
Resolving this would enable us to use larger batches and possibly train with data that hasn't been moved to locally mounted SSDs.

A possible solution is:
The maximum time to get N batches should be when max_workers=1. Adding more workers should enable us to parallelize this but right now there appears to be some blocking going on.

I have considered the following alternatives:
The issue can likely be traced to the implementation in

def enqueue_batches(self) -> None:

I have experimented with a few different ways to use workers in this method and have not seen significant improvement. Attempts included using a ThreadPoolExecutor combined with an as_completed loop over futures and also without an as_completed loop while queueing futures themselves.

Additional context
Reproduce this with the following:

bh = BatchHandler(..., sample_shape=(60, 60, 10), 
                               queue_cap=50, batch_size=16, 
                               n_batches=16, mode='lazy', 
                               max_workers=1)
start = time.time()
batches = list(bh)
print(time.time() - start)

bh = BatchHandler(..., sample_shape=(60, 60, 10), 
                               queue_cap=50, batch_size=16, 
                               n_batches=16, mode='lazy', 
                               max_workers=10)
start = time.time()
batches = list(bh)
print(time.time() - start)

I have profiled both of these code blocks with cProfile and see some strange differences in the timing and number of calls for the sample_batch function but don't know what to make of those differences. Without moving data to local SSD: With max_workers=1 I see a per call time of ~40 seconds and 16 calls With max_workers=10 I see a call time of ~400 seconds and 2 calls. Calls to get_batch go from ~20 seconds to ~60 seconds.

profile for max_workers=1:

max_workers1_profile

profile for max_workers=10:

max_workers10

Urgency / Timeframe
Not urgent. max_workers=1 works well currently with training data on local SSD.

@bnb32 bnb32 added the feature label Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant