ollama upload #14

Xyz00777 · 2025-01-23T20:13:28Z

please upload to ollama, there you can also upload different quantization or not quantization versions.

AHEADer · 2025-01-24T01:34:56Z

Hi, we found the ollama + gguf way behave worse than the cloud deployment. We recommend to use the huggingface inference endpoints currently. We will try uploading to ollama if local deployment matches the online inference in model performance.

Xyz00777 · 2025-01-24T01:59:04Z

Okay, I thought it would be good too because you wrote on the README that you encounter to not be able to upload a not quantized version to gguf, and I already seen many fp16 not quantized version on ollama... (I hope it's correct that fp16 are not quantized, as far as I understood the context)

AHEADer · 2025-01-24T02:31:31Z

Yes, fp16 is not the quantized version, but we trained with bf16, so fp16 version may not have the expected precision. BF16 support with Ollama is poor and super slow.

AHEADer mentioned this issue Jan 24, 2025

Generates text but does nothing #16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ollama upload #14

ollama upload #14

Xyz00777 commented Jan 23, 2025

AHEADer commented Jan 24, 2025

Xyz00777 commented Jan 24, 2025

AHEADer commented Jan 24, 2025

ollama upload #14

ollama upload #14

Comments

Xyz00777 commented Jan 23, 2025

AHEADer commented Jan 24, 2025

Xyz00777 commented Jan 24, 2025

AHEADer commented Jan 24, 2025