TorchQuant: A Quantization Development Kit for Researchers

This package is aimed at researchers who are researching and developing new quantization algorithms for neural networks using PyTorch. Most of the functionality of this package is also built-into latest versions of PyTorch but they are mostly aimed towards those interested in finding quantised version of their network, and does not offer sufficient flexibility for research purposes in our experience.

Features

Support for Affine and Q-format quantizers with STE back-propagation.
Easy interface for adding new types of quantizers.
Low-level functional API for state-less quantization.
Context-managers for quickly changing quantization mode.
Various range-observers for monitoring activation ranges.
Easily create quantizable models by wrapping entire layer (or a sequence of supported layers).
Easily convert common image models to quantized versions (including preserving weights)

Installation

TorchQuant as a package can be installed via pip:

$ pip install git+https://github.com/camlsys/torchquant.git

However, if you want to use this package as an starting point to develop your own quantisation schemes you can clone this repository directly in your project and install it in editable mode:

$ cd /path/to/your/project
$ git clone https://github.com/camlsys/torchquant.git
$ pip install -e .

Requirements

The only dependencies we use are: PyTorch, torchvision and efficientnet_pytorch. This library is tested as supporting the latest versions available at time of writing.

Package Reference

Functional Quantizers (`quantizers_functional.py`)

These functions don't have any state and simply quantize a tensor using the passed arguments. They implement Straight-Through-Estimator (STE) for back-propagation and are therefore differentiable.

Affine Quantization: affine_quantize(x, delta, zero_point, n_levels)
Q-format Quantization: qfmt_quantize(delta, min_int, max_int)

Class-based Quantizers (`quantizers.py`)

These classes wrap the functional API with some state about the ranges, num_bits etc into an nn.Module class. All quantizer object must inherit from the Quantizer class. For the modules supplied, you must provide a range observer instance. This may be subject to change.

Range Observers (`range_observers.py`)

RangeObserver objects can be used inside Quantizer objects to keep track of tensor ranges (but it is not the case that all quantizers must use them). Available types of range observers are:

BatchMinMax
ExpAvgMinMax

Quantized Modules (`qmodules.py`)

These layers provide a higher-level abstraction to implement apply quantization to a single module or a sequence of commonly-used modules. QWrapper can wrap existing a sequence of layers, while QOp wraps a single operator that returns a tensor (e.g. addition).

The supported patterns by QWrapper are similar to PyTorch:

Linear
Linear + ReLU(6) / Swish
Conv2d
Conv2d + ReLU(6) / Swish
Conv2d + BatchNorm2d
Conv2d + BatchNorm2d + nn.ReLU(6) / Swish

By default these fused layers are quantizable but quantization is turned off by default. You must specify the quantizers as arguments. For example:

wrapper = QWrapper(
    layers,
    weight_quantizer=AffineQuantizer(n_bits, BatchMinMax())
    acts_quantizer=AffineQuantizer(n_bits, ExpAvgMinMax())
)

op = QOp(operators.add, acts_quantizer=AffineQuantizer(n_bits, ExpAvgMinMax()))

These modules automatically support our state machine (see below under context managers).

Model-level

We provide support to convert ResNets and MobileNetV2s from Torchvision and EfficientNets from efficientnet_pytorch to a fused version. This will preserve the full precision weights.

fused_model = FusedResNet(
    full_precision_model,
    weight_quantizer=lambda module: QuantizerForMyModuleWeights(module),
    acts_quantizer=lambda module: QuantizerForMyModuleActivations(module)
)

Note that you are passed the module so you can do any setup required for your quantizer. This API may change depending on user feedback.

Utilities (`utils.py`)

Context-managers

For changing the quantization mode in a QModule you can do:

with qmodule_state(module, QModuleState.QUANT_AWARE_TRAIN):
    ...

# Alternatively:

set_qmodule_state(module, QModuleState.QUANT_AWARE_TRAIN)

This example showed changing to the training mode, but there are other modes of interest. The full set of modes provided are described in the QModuleState enum in qmodule.py.

Warning: You must set the mode explicitly. If you just call model.train() or model.eval(), the quantization mode will not change. This gives you finer control, but it is easy to forget.

Roadmap

Documentation generation.
Per-Channel Quantization.
BatchNorm Folding.
Binary Neural Networks.
Automated graph rewriting with torch.fx.
Sophisticated research techniques for quantization added as baselines.
Debugging tooling.
Integration with other toolkits e.g. HuggingFace, SpeechBrain, Flower, etc.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
tests		tests
torchquant		torchquant
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TorchQuant: A Quantization Development Kit for Researchers

Features

Installation

Requirements

Package Reference

Functional Quantizers (`quantizers_functional.py`)

Class-based Quantizers (`quantizers.py`)

Range Observers (`range_observers.py`)

Quantized Modules (`qmodules.py`)

Model-level

Utilities (`utils.py`)

Context-managers

Roadmap

About

Releases

Packages

Languages

License

camlsys/torchquant

Folders and files

Latest commit

History

Repository files navigation

TorchQuant: A Quantization Development Kit for Researchers

Features

Installation

Requirements

Package Reference

Functional Quantizers (quantizers_functional.py)

Class-based Quantizers (quantizers.py)

Range Observers (range_observers.py)

Quantized Modules (qmodules.py)

Model-level

Utilities (utils.py)

Context-managers

Roadmap

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Functional Quantizers (`quantizers_functional.py`)

Class-based Quantizers (`quantizers.py`)

Range Observers (`range_observers.py`)

Quantized Modules (`qmodules.py`)

Utilities (`utils.py`)

Packages