Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change diloco script to not have tail blocking #185

Merged
merged 1 commit into from
Jan 7, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 21 additions & 3 deletions scripts/simulate_multi_node_diloco.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,24 @@ function get_cuda_devices() {
# Array to store PIDs of child processes
child_pids=()

# Function to kill all child processes
# Modified cleanup function to handle tail separately
cleanup() {
echo "Cleaning up child processes..."
local killed=0

# First kill the main processes
for pid in "${child_pids[@]}"; do
if kill -TERM "$pid" 2>/dev/null; then
((killed++))
fi
done

# Kill the tail process if it exists
if [ -n "$tail_pid" ]; then
kill -TERM "$tail_pid" 2>/dev/null
((killed++))
fi

wait
echo "All child processes terminated. Killed $killed processes."
exit
Expand Down Expand Up @@ -65,7 +74,16 @@ do
child_pids+=($!)
done

# Start tail in background and store its PID separately
tail -f logs/log0.log &
child_pids+=($!)
tail_pid=$!

# Wait for the main processes only
for pid in "${child_pids[@]}"; do
wait $pid
done

wait
# Once main processes are done, kill the tail process
if [ -n "$tail_pid" ]; then
kill -TERM "$tail_pid"
fi
Loading