Fixes for fp16 tensors and datatypes in ONNX #156
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
QONNX traditionally made a number of assumptions about how quantization is represented: using
float32
carrier datatypes, with quantization introduced through explicitQuant
nodes and optionally with datatype annotations on tensors (InferDataTypes
). Some of these baked-in assumptions cause inflexibility around other datatypes. This PR relaxes a few of these restrictions aroundfloat16
by introducing the following changes:ModelWrapper.get_tensordatatype
previously always returnedFLOAT32
for any tensor without a datatype annotation. Now an additional check is performed: if there is no datatype annotation, we look at the underlying container datatype to have the option of returningFLOAT16
instead.InferDataTypes
supports preservingFLOAT16
either from annotations or from the container datatype.gen_finn_dt_tensor
now returnsnp.float16
if the desired data type is specified to beFLOAT16
.Quant
nodes will preservefloat16
tensors if those were originally provided as input/outputs.