support fused float8 gemm + bias add in torchao.float8 #1549

vkuzo · 2025-01-11T00:15:49Z

We've had a customer report that the bert-base-cased HuggingFace model's pooler module is especially sensitive to float8 quantization during training, and after debugging a bit evidence points to the fact that supporting fused float8 gemm + bias will help in this case.

Code pointer:

ao/torchao/float8/float8_linear.py

Line 401 in 24a78fe

output = output + self.bias.to(output.dtype)

torch._scaled_mm supports bias, so we just need to rewire the Float8Linear code.

The text was updated successfully, but these errors were encountered:

vkuzo self-assigned this Jan 11, 2025

jainapurva added the float8 label Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support fused float8 gemm + bias add in torchao.float8 #1549

support fused float8 gemm + bias add in torchao.float8 #1549

vkuzo commented Jan 11, 2025

support fused float8 gemm + bias add in torchao.float8 #1549

support fused float8 gemm + bias add in torchao.float8 #1549

Comments

vkuzo commented Jan 11, 2025