fix: Switch to sequential processing in batch_process to resolve thread-safety issues #169
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fix: Switch to sequential processing in batch_process to resolve thread-safety issues
Ticket
https://navalabs.atlassian.net/browse/DST-688
Changes
batch_process.py
to use sequential processing instead of parallel processingContext for reviewers
This PR addresses the Docker deployment crashes (CPU >1400%) issue when processing CSVs with multiple rows. Investigation revealed that the high CPU usage was caused by thread-safety issues in the underlying LiteLLM client libraries when running in parallel.
The fix is simple but effective: we've removed the parallel processing implementation and switched to sequential processing. This change resolves the CPU spike issues we were seeing in Docker deployments.
Follow-up Items (to be tracked in separate tickets):
_process_question
(currently only have mock calls to chat engine)Testing
Tested locally by:
The change eliminates the >1400% CPU spikes previously observed in Docker deployments while maintaining functionality.