Examples

The following examples are added

Passes (optimization techniques)

OrtPerTuning
- Raises known failure exceptions to immediately stop tuning.
- Default values for device and providers_list is based on the accelerator spec.
OrtTransformersOptimization
- Checks that model_type is provided in the pass configs or available in the model attributes. None is invalid.
- fp16 related arguments are better documented.
Introduce LoRA pass for finetuning pytorch models with Low-Rank Adaptation
Introduce OnnxMatMul4Quantizer pass to quantize onnx models to 4-bit integers.
Introduce OnnxBnb4Quantization pass to quantize onnx models to 4-bit data types from bitsandbytes (FP4, NF4).
Onnx external data configuration supports size_threshold and convert_attribute parameters.
LlamaPyTorchTensorParallel pass to split Llama model into a tensor parallel distributed pytorch model.
OnnxConversion
- Support DistributedPyTorchModel.
- use_device and torch_dtype options to specify device ("cpu", "cuda") and data type ("float16", "float32") for the model before conversion.
DeviceSpecificOnnxConversion removed in favor or OnnxConversion pass with use_device option.
LoRA/QLoRA
- Support training using ONNX Runtime Training.
- Mixed-precision training when torch_dtype=float16 for numerical stability.

Make engine/evaluator config optional in olive run config. With this default way, user can just run optimization without search and evaluation in simplest pass config.
evaluate_input_model is optional in engine config in no-search model. It is forced to False when no evaluator is provided.
ort_py_log_severity_level option to control logging level for onnxruntime python logs.
CLI option --tempdir to use a custom directory as the root directory for tempfile.
IO-Binding:
- New method to efficiently bind inputs and outputs to the session using either the CPU or GPU depending on the device.
- shared_kv_buffer option to enable key value buffer sharing between input (past key values) and output (present key values)

DistributedOnnxModel file structure updated to use resource paths. Can be saved from cache to destination directory.
Introduce DistributedPyTorchModel that is analogous to DistributedOnnxModel for pytorch model.
trust_remote_code added to HFConfig model_loading_args.