You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I see that these arguments are correctly passed to API, correctly passed to optimum part via llama-index integration, so it is incorrect statement that it is not implemented.
compile_only has effect only if user at least compiled model once, it is not expected that it brings any benefit from the first usage (in opposite, it requires more disk space to save precompiled model)., possibly it is the reason why you do not see benefit, if not, then it is GPU plugin issue, but neither openvino notebooks, llama-index or optimum intel.
Another possible reason, for avoiding recompilation, you need to use the same openvino version every usage, if you'll update openvino runtime even without model conversion, model will be recompiled. As llm notebooks uses nightly package as default openvino version, it means that ov runtime continuously updated that may prevent to see advantage of compile_only
Describe the bug
We saw PRs that make compile-only mode works.
huggingface/optimum-intel#873
huggingface/optimum-intel#1101
Then, we do the config modification with the latest optimum-intel (1.22.0.dev0+58aec63),
but the compile-only mode doesn't work in the sample llm-rag-llamaindex (https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-rag-llamaindex)
Screenshots
Expected behavior
The compile-only mode works to reduce the memory footprint.
Installation instructions (Please mark the checkbox)
[yes] I followed the installation guide at https://github.com/openvinotoolkit/openvino_notebooks#-installation-guide to install the notebooks.
The text was updated successfully, but these errors were encountered: