Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ollama upload #14

Open
Xyz00777 opened this issue Jan 23, 2025 · 3 comments
Open

ollama upload #14

Xyz00777 opened this issue Jan 23, 2025 · 3 comments

Comments

@Xyz00777
Copy link

please upload to ollama, there you can also upload different quantization or not quantization versions.

@AHEADer
Copy link
Collaborator

AHEADer commented Jan 24, 2025

Hi, we found the ollama + gguf way behave worse than the cloud deployment. We recommend to use the huggingface inference endpoints currently. We will try uploading to ollama if local deployment matches the online inference in model performance.

@Xyz00777
Copy link
Author

Okay, I thought it would be good too because you wrote on the README that you encounter to not be able to upload a not quantized version to gguf, and I already seen many fp16 not quantized version on ollama... (I hope it's correct that fp16 are not quantized, as far as I understood the context)

@AHEADer
Copy link
Collaborator

AHEADer commented Jan 24, 2025

Yes, fp16 is not the quantized version, but we trained with bf16, so fp16 version may not have the expected precision. BF16 support with Ollama is poor and super slow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants