You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Changes
UI:
Add a new "Branch chat" option to the chat tab.
Add a new "Search chats" menu to the chat tab.
Improve handling of markdown lists (#6626). This greatly improves the rendering of lists and nested lists in the UI. Thanks, @mamei16.
Reduce the size of HTML and CSS sent to the UI during streaming. This improves performance and reduces CPU usage.
Optimize the JavaScript to reduce the CPU usage during streaming.
Add a horizontal scrollbar to code blocks that are wider than the chat area.
Make responses start faster by removing unnecessary cleanup calls (#6625). This removes a 0.2 second delay for llama.cpp and ExLlamaV2 while also increasing the reported tokens/second.
Add a --torch-compile flag for transformers (improves performance).
Add a "Static KV cache" option for transformers (improves performance).
Connect XTC, DRY, smoothing_factor, and dynatemp to the ExLlamaV2 loader (non-HF).
Remove the AutoGPTQ loader (#6641). The project was discontinued, and no wheels had been available for a while. GPTQ models can still be loaded through ExLlamaV2.
Streamline the one-click installer by asking one question to NVIDIA users instead of two.
Add a --exclude-pattern flag to the download-model.py script (#6542). Thanks, @JackCloudman.
Add IPv6 support to the API (#6559). Thanks, @BPplays.
Bug fixes
Fix an orjson.JSONDecodeError error on page reload.
Fix the font size of lists in chat mode.
Fix CUDA error on MPS backend during API request (#6572). Thanks, @skywinder.
Add UnicodeDecodeError workaround for modules/llamacpp_model.py (#6040). Thanks, @nclok1405.
Training_PRO fix: add if 'quantization_config' in shared.model.config.to_dict() (#6640). Thanks, @FartyPants.
Backend updates
llama-cpp-python: bump to 0.3.6 (llama.cpp commit f7cd13301c2a88f97073fd119072b4cc92c08df1, January 8, 2025).