You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to train using a multi-GPU setup with DDP (launching with accelerate launch), but I am noticing that the loss values are significantly different from a single GPU setup with the same effective batch size.
I have attached the eval/loss curves below.
In purple is a single gpu run with per_device_train_batch_size=16
In blue is a multi gpu run with 8 gpus and per_device_train_batch_size=2 (only trained for a few steps)
All other hyperparameters are the same.
I am wondering why in (2) the loss values seem to be much smaller than in (1)? Any suggestions are much appreciated!
The text was updated successfully, but these errors were encountered:
I am trying to train using a multi-GPU setup with DDP (launching with
accelerate launch
), but I am noticing that the loss values are significantly different from a single GPU setup with the same effective batch size.I have attached the eval/loss curves below.
per_device_train_batch_size=16
per_device_train_batch_size=2
(only trained for a few steps)All other hyperparameters are the same.
I am wondering why in (2) the loss values seem to be much smaller than in (1)? Any suggestions are much appreciated!
The text was updated successfully, but these errors were encountered: