Quantizer logging and summary info #7718
Labels
feature
A request for a proper, new feature.
module: quantization
user experience
reducing friction for users
🚀 The feature, motivation and pitch
When quantizing models, I'd like to be able to easily see which operators were actually quantized, as well as how many. It's visible in the exported_program, but it can be very long and difficult to parse at a glance. When changing quantization scheme or doing selective quantization, it's easy to accidentally not quantize anything (or quantize less than expected) due to opaque feature/operator support in the quantization flow. Quantization can "succeed" but do nothing. Users also often just run the exact quantization scheme from the docs and don't have a good way to know what it did.
There are a few ways to solve this. The easiest might be to to just add logging into the quantizer, perhaps printing a summary at info level and warn if nothing is quantized. It could look something like this:
Quantization parameter information should ideally include key parameters for the quantization scheme, such as static vs dynamic activations or weight only, weight nbits, symmetric vs asymmetric, and per-tensor / per-channel / groupwise (with size). It could also include activation quantization parameters, but these tend to vary less, and it may be better to keep the summary brief.
In the event that nothing is quantized, we should print a warning. Perhaps with module-level granularity, such that if you specify module-level qparams for nn.Linear and nn.Conv2d, but it only quantizes linears, it would warn that no Conv2ds were quantized.
We should additionally include debug-level logging to note each quantized operator, as well as cases when an operator isn't quantized due to some constraint.
Alternatives
Users can dump the exported_program after conversion. However, the signal to noise ratio is high if they just want quantization info.
We could also provide a dedicated call to print the quantization summary, but I think logging by default is a better option, as users don't need to explicitly call it.
We could go further and provide a visual graph-level dump of the quantization, though starting with logging offers most of the benefit for much less development effort.
Additional context
No response
RFC (Optional)
No response
cc @kimishpatel @jerryzh168
The text was updated successfully, but these errors were encountered: