-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ollama upload #14
Comments
Hi, we found the ollama + gguf way behave worse than the cloud deployment. We recommend to use the huggingface inference endpoints currently. We will try uploading to ollama if local deployment matches the online inference in model performance. |
Okay, I thought it would be good too because you wrote on the README that you encounter to not be able to upload a not quantized version to gguf, and I already seen many fp16 not quantized version on ollama... (I hope it's correct that fp16 are not quantized, as far as I understood the context) |
Yes, fp16 is not the quantized version, but we trained with bf16, so fp16 version may not have the expected precision. BF16 support with Ollama is poor and super slow. |
please upload to ollama, there you can also upload different quantization or not quantization versions.
The text was updated successfully, but these errors were encountered: