-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Just use huggingface #6
Comments
Sure, use whatever works. This repo is intended to serve as a point of communication about llama, and also as an extra mirror. Note that Facebook has been issuing takedown requests against huggingface llama repositories, so those may get knocked offline. |
It's worth to note that those models files have been converted to be used in the HF library, so if we take the 7B models files here According the authors the model has been infact
So supposed we want to use model's file in C++ inference here I'm not sure if would work. |
@loretoparisi Ya i'm thinking along the same lines and trying to make sense here. There are 8bit and 4bit quantized, the original and the huggingface versions... I think C++ inference use the original weights and converts them, to Can this be confirmed? Also, I am currently using ipfs downloading current time is Any thoughts on the model formats with C++ or a way to download the weights faster? |
yes confirmed. You first convert weights to ggml FP16 or FP32, then quantize to 4bit and run inference (cpu only). |
Ah ok, so your suppose to get the original released weights and the C++ code converts it? Also I found an original weight torrent link and its going extremely fast, ETA 3hour for 235GB.
|
yes this is exactly what I did from the download here. |
You can also use https://huggingface.co/huggyllama, works with llama.cpp. |
All of the models are on huggingface already. https://huggingface.co/decapoda-research
there's even an open, working pr to add support to the transformers lib.
The text was updated successfully, but these errors were encountered: