Skip to content

v2.2

Latest
Compare
Choose a tag to compare
@oobabooga oobabooga released this 09 Jan 21:48
e6eda6a

Changes

  • UI:
    • Add a new "Branch chat" option to the chat tab.
    • Add a new "Search chats" menu to the chat tab.
    • Improve handling of markdown lists (#6626). This greatly improves the rendering of lists and nested lists in the UI. Thanks, @mamei16.
    • Reduce the size of HTML and CSS sent to the UI during streaming. This improves performance and reduces CPU usage.
    • Optimize the JavaScript to reduce the CPU usage during streaming.
    • Add a horizontal scrollbar to code blocks that are wider than the chat area.
  • Make responses start faster by removing unnecessary cleanup calls (#6625). This removes a 0.2 second delay for llama.cpp and ExLlamaV2 while also increasing the reported tokens/second.
  • Add a --torch-compile flag for transformers (improves performance).
  • Add a "Static KV cache" option for transformers (improves performance).
  • Connect XTC, DRY, smoothing_factor, and dynatemp to the ExLlamaV2 loader (non-HF).
  • Remove the AutoGPTQ loader (#6641). The project was discontinued, and no wheels had been available for a while. GPTQ models can still be loaded through ExLlamaV2.
  • Streamline the one-click installer by asking one question to NVIDIA users instead of two.
  • Add a --exclude-pattern flag to the download-model.py script (#6542). Thanks, @JackCloudman.
  • Add IPv6 support to the API (#6559). Thanks, @BPplays.

Bug fixes

  • Fix an orjson.JSONDecodeError error on page reload.
  • Fix the font size of lists in chat mode.
  • Fix CUDA error on MPS backend during API request (#6572). Thanks, @skywinder.
  • Add UnicodeDecodeError workaround for modules/llamacpp_model.py (#6040). Thanks, @nclok1405.
  • Training_PRO fix: add if 'quantization_config' in shared.model.config.to_dict() (#6640). Thanks, @FartyPants.

Backend updates

  • llama-cpp-python: bump to 0.3.6 (llama.cpp commit f7cd13301c2a88f97073fd119072b4cc92c08df1, January 8, 2025).