Note
Manual device mapping is deprecated in favor of automatic device mapping due to the possibility for user error in manual. The topology system will remain and be used only for quantization settings. Please see the device mapping documentation for more information.
Use a simple model topology to configure ISQ and device mapping for per-layer with a single YAML file (examples here)!
To support per-layer mix of ISQ, Mistral.rs supports loading a model topology YAML file. This YAML file is formatted as follows:
- Top-level keys are either:
- A range of layers (
start-end
) wherestart < end
.start
is inclusive andend
is exclusive - A single layer number
- The topology for the range or layer:
- An optional key (
isq
) which maps to a single value, which can be any ISQ type. If not specified, there is no ISQ for this range of layers applied. - An optional key (
device
) which maps to a single value, which is one of the below. If not specified, the default loading deice will be used.cpu
cuda[ORDINAL]
metal[ORDINAL]
- An optional key (
- A range of layers (
Note that:
- The topology for the range is expanded to fill the range
- If ranges overlap, the range with the higher end layer takes precedence and will overwrite
- Any layers which are not covered will have no topology mapping. They will inherit any other ISQ (e.g. with
--isq
/in_situ_quant
) set. - Unless the layer is not covered by the topology, the topology value will override any other ISQ (e.g. with
--isq
/in_situ_quant
). - The topology device mapping will override any other device mapping.
- When using UQFF, only the device mapping is relevant.
0-8:
isq: Q3K
device: cuda[0]
8-16:
isq: Q4K
device: cpu
16-24:
isq: Q6K
# Skip 24-28
28-32:
isq: Q8_0
device: cuda[0]
Model topologies may be applied to all model types.
Note
You should replace --features ...
with one of the features specified here, or remove it for pure CPU inference.
cargo run --features ... -- -i plain -m microsoft/Phi-3-mini-128k-instruct -a phi3 --topology topologies/isq.yml
Note
You should replace --features ...
with one of the features specified here, or remove it for pure CPU inference.
cargo run --features ... -- --port 1234 plain -m microsoft/Phi-3-mini-128k-instruct -a phi3 --topology topologies/isq.yml
Example here.
Example here.