To lower memory requirements during training:

Use a DeepSpeed config to launch training (refer to accelerate_configs/deepspeed.yaml as an example).
Pass --precompute_conditions when launching training.
Pass --gradient_checkpointing when launching training.
Pass --use_8bit_bnb when launching training. Note that this is only applicable to Adam and AdamW optimizers.
Do not perform validation/testing. This saves a significant amount of memory, which can be used to focus solely on training if you're on smaller VRAM GPUs.

We will continue to add more features that help to reduce memory consumption.

Provide feedback

Saved searches