Quantizer logging and summary info #7718

GregoryComer · 2025-01-17T00:20:12Z

🚀 The feature, motivation and pitch

When quantizing models, I'd like to be able to easily see which operators were actually quantized, as well as how many. It's visible in the exported_program, but it can be very long and difficult to parse at a glance. When changing quantization scheme or doing selective quantization, it's easy to accidentally not quantize anything (or quantize less than expected) due to opaque feature/operator support in the quantization flow. Quantization can "succeed" but do nothing. Users also often just run the exact quantization scheme from the docs and don't have a good way to know what it did.

There are a few ways to solve this. The easiest might be to to just add logging into the quantizer, perhaps printing a summary at info level and warn if nothing is quantized. It could look something like this:

Quantization Summary (XNNPACK Quantizer):
 torch.nn.Linear (static, 4-bit symmetric, groupwise, gs=32): 10 instances
 torch.nn.Conv2d (static, 8-bit symmetric, per-tensor): 12 instances
 ...

Quantization parameter information should ideally include key parameters for the quantization scheme, such as static vs dynamic activations or weight only, weight nbits, symmetric vs asymmetric, and per-tensor / per-channel / groupwise (with size). It could also include activation quantization parameters, but these tend to vary less, and it may be better to keep the summary brief.

In the event that nothing is quantized, we should print a warning. Perhaps with module-level granularity, such that if you specify module-level qparams for nn.Linear and nn.Conv2d, but it only quantizes linears, it would warn that no Conv2ds were quantized.

[Warning] XNNPACK quantizer did not find any operators to quantize.
or
[Warning] XNNPACK quantizer did not find any Conv2d operators to quantize.

We should additionally include debug-level logging to note each quantized operator, as well as cases when an operator isn't quantized due to some constraint.

Alternatives

Users can dump the exported_program after conversion. However, the signal to noise ratio is high if they just want quantization info.

We could also provide a dedicated call to print the quantization summary, but I think logging by default is a better option, as users don't need to explicitly call it.

We could go further and provide a visual graph-level dump of the quantization, though starting with logging offers most of the benefit for much less development effort.

Additional context

No response

RFC (Optional)

No response

cc @kimishpatel @jerryzh168

The text was updated successfully, but these errors were encountered:

GregoryComer added feature A request for a proper, new feature. module: quantization user experience reducing friction for users labels Jan 17, 2025

github-project-automation bot added this to ExecuTorch DevX improvements Jan 17, 2025

github-project-automation bot moved this to To triage in ExecuTorch DevX improvements Jan 17, 2025

GregoryComer changed the title ~~Quantizer logging/visualization~~ Quantizer logging and summary info Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantizer logging and summary info #7718

Quantizer logging and summary info #7718

GregoryComer commented Jan 17, 2025 •

edited

Loading

Quantizer logging and summary info #7718

Quantizer logging and summary info #7718

Comments

GregoryComer commented Jan 17, 2025 • edited Loading

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

GregoryComer commented Jan 17, 2025 •

edited

Loading