Skip to content

Latest commit

 

History

History
9 lines (7 loc) · 658 Bytes

optimization.md

File metadata and controls

9 lines (7 loc) · 658 Bytes

To lower memory requirements during training:

  • Use a DeepSpeed config to launch training (refer to accelerate_configs/deepspeed.yaml as an example).
  • Pass --precompute_conditions when launching training.
  • Pass --gradient_checkpointing when launching training.
  • Pass --use_8bit_bnb when launching training. Note that this is only applicable to Adam and AdamW optimizers.
  • Do not perform validation/testing. This saves a significant amount of memory, which can be used to focus solely on training if you're on smaller VRAM GPUs.

We will continue to add more features that help to reduce memory consumption.