Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added notebook to showcase quantization of Sentence Transformers model #955

Merged
merged 13 commits into from
Oct 25, 2024

Conversation

AlexKoff88
Copy link
Collaborator

No description provided.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@helena-intel helena-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AlexKoff88 this is a great example.

notebooks/openvino/sentence_transformer_quantization.ipynb Outdated Show resolved Hide resolved
notebooks/openvino/sentence_transformer_quantization.ipynb Outdated Show resolved Hide resolved
notebooks/openvino/sentence_transformer_quantization.ipynb Outdated Show resolved Hide resolved
],
"source": [
"# FP32 baseline model\n",
"!benchmark_app -m all-MiniLM-L6-v2/openvino_model.xml -shape \"input_ids[1,384],attention_mask[1,384],token_type_ids[1,384]\" -api sync -niter 200"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reshapes the model to static shapes, which has great improvement for especially INT8. But most people will not use static shapes in practice, and padding/truncating to 384 is not always desired. IMO it is fairer to compare performance by looping over a dataset (e.g. modifying the evaluate function to add timings) but then there is not as much of a performance difference. If we keep benchmark_app, it would be good to at least explain the static shapes. (Using data_shape instead of shape in benchmark_app does not reshape the model, but you still use the same shape length everywhere, so still not a standard use case)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Helena. I would not agree in this specific case as the tokenizer truncates data anyway so it is about static shape. But I can add information about it.

@l-bat
Copy link
Contributor

l-bat commented Oct 24, 2024

Why is the squad dataset needed?

DATASET_NAME = "squad"
dataset = datasets.load_dataset(DATASET_NAME)

@AlexKoff88
Copy link
Collaborator Author

Why is the squad dataset needed?

DATASET_NAME = "squad"
dataset = datasets.load_dataset(DATASET_NAME)

Thanks @l-bat. Fixed

@AlexKoff88
Copy link
Collaborator Author

PR is ready.

Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@echarlaix echarlaix merged commit fe82729 into main Oct 25, 2024
23 checks passed
@echarlaix echarlaix deleted the ak/sentence_transformers_notebook branch October 25, 2024 08:32
@AlexKoff88
Copy link
Collaborator Author

Looks great, thanks for the addition @AlexKoff88. Could also be added to https://github.com/huggingface/optimum-intel/blob/v1.20.0/notebooks/openvino/README.md

will do in the follow-up PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants