Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantizer logging and summary info #7718

Open
GregoryComer opened this issue Jan 17, 2025 · 0 comments
Open

Quantizer logging and summary info #7718

GregoryComer opened this issue Jan 17, 2025 · 0 comments
Labels
feature A request for a proper, new feature. module: quantization user experience reducing friction for users

Comments

@GregoryComer
Copy link
Member

GregoryComer commented Jan 17, 2025

🚀 The feature, motivation and pitch

When quantizing models, I'd like to be able to easily see which operators were actually quantized, as well as how many. It's visible in the exported_program, but it can be very long and difficult to parse at a glance. When changing quantization scheme or doing selective quantization, it's easy to accidentally not quantize anything (or quantize less than expected) due to opaque feature/operator support in the quantization flow. Quantization can "succeed" but do nothing. Users also often just run the exact quantization scheme from the docs and don't have a good way to know what it did.

There are a few ways to solve this. The easiest might be to to just add logging into the quantizer, perhaps printing a summary at info level and warn if nothing is quantized. It could look something like this:

Quantization Summary (XNNPACK Quantizer):
 torch.nn.Linear (static, 4-bit symmetric, groupwise, gs=32): 10 instances
 torch.nn.Conv2d (static, 8-bit symmetric, per-tensor): 12 instances
 ...

Quantization parameter information should ideally include key parameters for the quantization scheme, such as static vs dynamic activations or weight only, weight nbits, symmetric vs asymmetric, and per-tensor / per-channel / groupwise (with size). It could also include activation quantization parameters, but these tend to vary less, and it may be better to keep the summary brief.

In the event that nothing is quantized, we should print a warning. Perhaps with module-level granularity, such that if you specify module-level qparams for nn.Linear and nn.Conv2d, but it only quantizes linears, it would warn that no Conv2ds were quantized.

[Warning] XNNPACK quantizer did not find any operators to quantize.
or
[Warning] XNNPACK quantizer did not find any Conv2d operators to quantize.

We should additionally include debug-level logging to note each quantized operator, as well as cases when an operator isn't quantized due to some constraint.

Alternatives

Users can dump the exported_program after conversion. However, the signal to noise ratio is high if they just want quantization info.

We could also provide a dedicated call to print the quantization summary, but I think logging by default is a better option, as users don't need to explicitly call it.

We could go further and provide a visual graph-level dump of the quantization, though starting with logging offers most of the benefit for much less development effort.

Additional context

No response

RFC (Optional)

No response

cc @kimishpatel @jerryzh168

@GregoryComer GregoryComer added feature A request for a proper, new feature. module: quantization user experience reducing friction for users labels Jan 17, 2025
@GregoryComer GregoryComer changed the title Quantizer logging/visualization Quantizer logging and summary info Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A request for a proper, new feature. module: quantization user experience reducing friction for users
Projects
Status: To triage
Development

No branches or pull requests

1 participant