v2.2

Latest

Latest

oobabooga released this 09 Jan 21:48

e6eda6a

Changes

UI:
- Add a new "Branch chat" option to the chat tab.
- Add a new "Search chats" menu to the chat tab.
- Improve handling of markdown lists (#6626). This greatly improves the rendering of lists and nested lists in the UI. Thanks, @mamei16.
- Reduce the size of HTML and CSS sent to the UI during streaming. This improves performance and reduces CPU usage.
- Optimize the JavaScript to reduce the CPU usage during streaming.
- Add a horizontal scrollbar to code blocks that are wider than the chat area.
Make responses start faster by removing unnecessary cleanup calls (#6625). This removes a 0.2 second delay for llama.cpp and ExLlamaV2 while also increasing the reported tokens/second.
Add a --torch-compile flag for transformers (improves performance).
Add a "Static KV cache" option for transformers (improves performance).
Connect XTC, DRY, smoothing_factor, and dynatemp to the ExLlamaV2 loader (non-HF).
Remove the AutoGPTQ loader (#6641). The project was discontinued, and no wheels had been available for a while. GPTQ models can still be loaded through ExLlamaV2.
Streamline the one-click installer by asking one question to NVIDIA users instead of two.
Add a --exclude-pattern flag to the download-model.py script (#6542). Thanks, @JackCloudman.
Add IPv6 support to the API (#6559). Thanks, @BPplays.

Bug fixes

Fix an orjson.JSONDecodeError error on page reload.
Fix the font size of lists in chat mode.
Fix CUDA error on MPS backend during API request (#6572). Thanks, @skywinder.
Add UnicodeDecodeError workaround for modules/llamacpp_model.py (#6040). Thanks, @nclok1405.
Training_PRO fix: add if 'quantization_config' in shared.model.config.to_dict() (#6640). Thanks, @FartyPants.

Backend updates

llama-cpp-python: bump to 0.3.6 (llama.cpp commit f7cd13301c2a88f97073fd119072b4cc92c08df1, January 8, 2025).

Contributors

skywinder, JackCloudman, and 4 other contributors

Assets 2