Multi-GPU Training Giving Different Loss #276

nikhil-ghosh-berkeley · 2023-11-16T23:21:31Z

I am trying to train using a multi-GPU setup with DDP (launching with accelerate launch), but I am noticing that the loss values are significantly different from a single GPU setup with the same effective batch size.

I have attached the eval/loss curves below.

In purple is a single gpu run with per_device_train_batch_size=16
In blue is a multi gpu run with 8 gpus and per_device_train_batch_size=2 (only trained for a few steps)

All other hyperparameters are the same.

I am wondering why in (2) the loss values seem to be much smaller than in (1)? Any suggestions are much appreciated!

The text was updated successfully, but these errors were encountered:

giaosudau · 2023-11-24T02:33:10Z

Hey @nikhil-ghosh-berkeley
Can you give full command how to launch the qlora with multiple gpus? I am training on 4GPUs A100-40GB but got OOM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-GPU Training Giving Different Loss #276

Multi-GPU Training Giving Different Loss #276

nikhil-ghosh-berkeley commented Nov 16, 2023

giaosudau commented Nov 24, 2023

Multi-GPU Training Giving Different Loss #276

Multi-GPU Training Giving Different Loss #276

Comments

nikhil-ghosh-berkeley commented Nov 16, 2023

giaosudau commented Nov 24, 2023