Experiment difference optimization method The model experiment is the intents predictions use BERT base model fine tune tin CLINC150 dataset ( have 150 intents labels in 10 domains) Techniques to speed up the predictions and reduce the memory footprint include
- Quantization
- knowledge distillation
- quantization training
- pruning
- graph optimization ( with ONNX and ORT)