EricLBuehler · EricLBuehler · Jan 22, 2025 · Jan 18, 2025 · Jan 18, 2025 · Jan 19, 2025
diff --git a/README.md b/README.md
@@ -54,10 +54,10 @@ Please submit requests for new models [here](https://github.com/EricLBuehler/mis
     ./mistralrs-server -i vision-plain -m lamm-mit/Cephalo-Llama-3.2-11B-Vision-Instruct-128k -a vllama
     ```
 
-- 🌟📷 Run the **Qwen2-VL** Model: [documentation and guide here](docs/QWEN2VL.md)
+- 🌟📷 Run the **MiniCPM-O 2.6** Model: [documentation and guide here](docs/MINICPMO_2_6.md)
 
     ```
-    ./mistralrs-server -i vision-plain -m Qwen/Qwen2-VL-2B-Instruct -a qwen2vl
+    ./mistralrs-server -i vision-plain -m openbmb/MiniCPM-o-2_6 -a minicpmo
     ```
 
 - 🤗📷 Run the **Smol VLM** Model: [documentation and guide here](docs/IDEFICS3.md)
@@ -171,6 +171,7 @@ https://github.com/EricLBuehler/mistral.rs/assets/65165915/09d9a30f-1e22-4b9a-90
 |Idefics 3|✅| |✅|✅|
 |DeepseekV2|✅| |✅| |
 |DeepseekV3|✅| |✅| |
+|MinCPM-O 2.6|✅| |✅| |
 
 ## APIs and Integrations
 
@@ -439,6 +440,7 @@ If you do not specify the architecture, an attempt will be made to use the model
 - `vllama`
 - `qwen2vl`
 - `idefics3`
+- `minicpmo`
 
 ### Supported GGUF architectures
 
@@ -537,6 +539,8 @@ Please submit more benchmarks via raising an issue!
 |Qwen2-VL| | |✅|
 |Idefics 3| | |✅|
 |Deepseek V2| | |✅|
+|Deepseek V3| | |✅|
+|MiniCPM-O 2.6| | |✅|
 
 **Device mapping support**
 |Model category|Supported|
@@ -566,6 +570,8 @@ Please submit more benchmarks via raising an issue!
 |Qwen2-VL| | | |
 |Idefics 3| | | |
 |Deepseek V2| | | |
+|Deepseek V3| | | |
+|MiniCPM-O 2.6| | | |
 
 **AnyMoE support**
 |Model|AnyMoE|
@@ -588,6 +594,8 @@ Please submit more benchmarks via raising an issue!
 |Qwen2-VL| |
 |Idefics 3|✅|
 |Deepseek V2| |
+|Deepseek V3| |
+|MiniCPM-O 2.6| |
 
 
 ### Using derivative model

diff --git a/docs/MINICPMO_2_6.md b/docs/MINICPMO_2_6.md
@@ -0,0 +1,235 @@
+# MiniCPM-O 2.6 Model: [`openbmb/MiniCPM-o-2_6`](https://huggingface.co/openbmb/MiniCPM-o-2_6)
+
+Mistral.rs supports the MiniCPM-O 2.6 model, with examples in the Rust, Python, and HTTP APIs. ISQ quantization is supported to allow running the model with less memory requirements.
+
+UQFF quantizations are coming soon.
+
+> [!NOTE]
+> Only the vision portion of this model has been implemented. No audio features are supported yet.
+
+The Python and HTTP APIs support sending images as:
+- URL
+- Path to a local image
+- [Base64](https://en.wikipedia.org/wiki/Base64) encoded string
+
+The Rust API takes an image from the [image](https://docs.rs/image/latest/image/index.html) crate.
+
+## ToC
+- [Interactive mode](#interactive-mode)
+- [HTTP server](#http-server)
+- [Rust API](#rust)
+- [Python API](#python)
+
+## Interactive mode
+
+Mistral.rs supports interactive mode for vision models! It is an easy way to interact with the model.
+
+1) Start up interactive mode with the MiniCPM-O 2.6 Model model
+
+> [!NOTE]
+> You should replace `--features ...` with one of the features specified [here](../README.md#supported-accelerators), or remove it for pure CPU inference.
+
+```
+cargo run --features ... --release -- -i --isq Q4K vision-plain -m openbmb/MiniCPM-o-2_6 -a minicpmo
+```
+
+2) Say hello!
+```
+> Hello!
+How can I assist you today?
+```
+
+3) Pass the model an image and ask a question.
+
+```
+> Hello!
+How can I assist you today?
+> \image https://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Rosa_Precious_platinum.jpg/220px-Rosa_Precious_platinum.jpg What is this image?
+The image shows a close-up view of a rose flower with dew drops on its petals. The rose is in full bloom, with its petals unfolding and displaying vibrant pink coloration. The dew drops on the petals create a delicate, glistening effect, adding to the overall visual appeal of the flower. The background is blurred, focusing attention on the intricate details of the rose.
+```
+
+
+## HTTP server
+You can find this example [here](../examples/server/minicpmo_2_6.py).
+
+We support an OpenAI compatible HTTP API for vision models. This example demonstrates sending a chat completion request with an image.
+
+> Note: The image_url may be either a path, URL, or a base64 encoded string.
+
+---
+
+**Image:**
+<img src="https://www.nhmagazine.com/content/uploads/2019/05/mtwashingtonFranconia-2-19-18-108-Edit-Edit.jpg" alt="Mount Washington" width = "1000" height = "666">
+<h6><a href = "https://www.nhmagazine.com/mount-washington/">Credit</a></h6>
+
+**Prompt:**
+```
+What is shown in this image? Write a detailed response analyzing the scene.
+```
+
+**Output:**
+```
+The image shows Mount Washington, the highest peak in the Northeastern United States, located in the White Mountains of New Hampshire. The scene captures the mountain's rugged terrain and varied landscape features. 
+
+In the foreground, there are dense forests of coniferous trees, primarily spruce and fir, which are typical of the region's boreal forest ecosystem. The trees are densely packed, indicating a high level of vegetation cover and biodiversity.
+
+Moving upwards, the image reveals rocky outcroppings and boulders scattered across the slope, indicating the mountain's geological history of glacial activity. The presence of these rocks suggests that the area was once covered by ice sheets during the last ice age, which carved out the landscape and left behind a mix of boulders and talus slopes.
+
+In the mid-ground, the image shows a series of ridges and valleys, which are characteristic of the mountain's glacially sculpted terrain. These features were formed by the movement of ice sheets that carved out U-shaped valleys and left behind a series of rounded hills and ridges.
+
+At the summit, there is a prominent observation tower or weather station, which is likely used for scientific research and weather monitoring. The structure is situated at an elevation of approximately 6,288 feet (1,917 meters) above sea level, making it one of the highest points in the region.
+
+The image also captures the atmospheric conditions on Mount Washington, with clouds and mist visible in the background. The mountain's unique location in a region where cold Arctic air meets warm moist air from the Gulf Stream creates a unique microclimate known as the "Home Rule," where extreme weather conditions can occur.
+
+Overall, the image showcases the diverse geological and ecological features of Mount Washington, highlighting its role as a significant natural landmark in the Northeastern United States.
+```
+
+---
+
+1) Start the server
+
+> [!NOTE]
+> You should replace `--features ...` with one of the features specified [here](../README.md#supported-accelerators), or remove it for pure CPU inference.
+
+```
+cargo run --release --features ... -- --port 1234 --isq Q4K vision-plain -m openbmb/MiniCPM-o-2_6 -a minicpmo
+```
+
+2) Send a request
+
+```py
+from openai import OpenAI
+
+client = OpenAI(api_key="foobar", base_url="http://localhost:1234/v1/")
+
+completion = client.chat.completions.create(
+    model="minicpmo_2_6",
+    messages=[
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "image_url",
+                    "image_url": {
+                        "url": "https://www.nhmagazine.com/content/uploads/2019/05/mtwashingtonFranconia-2-19-18-108-Edit-Edit.jpg"
+                    },
+                },
+                {
+                    "type": "text",
+                    "text": "(<image>./</image>) What is shown in this image? Write a detailed response analyzing the scene.",
+                },
+            ],
+        },
+    ],
+    max_tokens=256,
+    frequency_penalty=1.0,
+    top_p=0.1,
+    temperature=0,
+)
+resp = completion.choices[0].message.content
+print(resp)
+
+```
+
+- You can find an example of encoding the [image via base64 here](../examples/server/phi3v_base64.py).
+- You can find an example of loading an [image locally here](../examples/server/phi3v_local_img.py).
+
+---
+
+## Rust
+You can find this example [here](../mistralrs/examples/minicpmo_2_6/main.rs).
+
+```rust
+use anyhow::Result;
+use mistralrs::{IsqType, TextMessageRole, VisionLoaderType, VisionMessages, VisionModelBuilder};
+
+const MODEL_ID: &str = "openbmb/MiniCPM-o-2_6";
+
+#[tokio::main]
+async fn main() -> Result<()> {
+    let model =
+        VisionModelBuilder::new(MODEL_ID, VisionLoaderType::VLlama)
+            .with_isq(IsqType::Q4K)
+            .with_logging()
+            .build()
+            .await?;
+
+    let bytes = match reqwest::blocking::get(
+        "https://cdn.britannica.com/45/5645-050-B9EC0205/head-treasure-flower-disk-flowers-inflorescence-ray.jpg",
+    ) {
+        Ok(http_resp) => http_resp.bytes()?.to_vec(),
+        Err(e) => anyhow::bail!(e),
+    };
+    let image = image::load_from_memory(&bytes)?;
+
+    let messages = VisionMessages::new().add_image_message(
+        TextMessageRole::User,
+        "What is depicted here? Please describe the scene in detail.",
+        image,
+    );
+
+    let response = model.send_chat_request(messages).await?;
+
+    println!("{}", response.choices[0].message.content.as_ref().unwrap());
+    dbg!(
+        response.usage.avg_prompt_tok_per_sec,
+        response.usage.avg_compl_tok_per_sec
+    );
+
+    Ok(())
+}
+```
+
+---
+
+## Python
+You can find this example [here](../examples/python/minicpmo_2_6.py).
+
+This example demonstrates loading and sending a chat completion request with an image.
+
+> Note: the image_url may be either a path, URL, or a base64 encoded string.
+
+```py
+from mistralrs import Runner, Which, ChatCompletionRequest, VisionArchitecture
+
+MODEL_ID = "openbmb/MiniCPM-o-2_6"
+
+runner = Runner(
+    which=Which.VisionPlain(
+        model_id=MODEL_ID,
+        arch=VisionArchitecture.MiniCpmO,
+    ),
+)
+
+res = runner.send_chat_completion_request(
+    ChatCompletionRequest(
+        model="minicpmo_2_6",
+        messages=[
+            {
+                "role": "user",
+                "content": [
+                    {
+                        "type": "image_url",
+                        "image_url": {
+                            "url": "https://www.nhmagazine.com/content/uploads/2019/05/mtwashingtonFranconia-2-19-18-108-Edit-Edit.jpg"
+                        },
+                    },
+                    {
+                        "type": "text",
+                        "text": "(<image>./</image>) What is shown in this image? Write a detailed response analyzing the scene.",
+                    },
+                ],
+            }
+        ],
+        max_tokens=256,
+        presence_penalty=1.0,
+        top_p=0.1,
+        temperature=0.1,
+    )
+)
+print(res.choices[0].message.content)
+print(res.usage)
+```
+
+- You can find an example of encoding the [image via base64 here](../examples/python/phi3v_base64.py).
+- You can find an example of loading an [image locally here](../examples/python/phi3v_local_img.py).
diff --git a/docs/README.md b/docs/README.md
@@ -13,6 +13,9 @@
 - [Llama 3.2 Vision](VLLAMA.md)
 - [Qwen2-VL](QWEN2VL.md)
 - [Idefics 3 and Smol VLM](IDEFICS3.md)
+- [DeepSeek V2](DEEPSEEKV2.md)
+- [DeepSeek V3](DEEPSEEKV3.md)
+- [MiniCPM-O 2.6](MINICPMO_2_6.md)
 
 ## Adapters
 - [Docs](ADAPTER_MODELS.md)

diff --git a/docs/VLLAMA.md b/docs/VLLAMA.md
@@ -129,7 +129,7 @@ completion = client.chat.completions.create(
                 },
                 {
                     "type": "text",
-                    "text": "What is shown in this image? Write a detailed response analyzing the scene.",
+                    "text": "<|image|>What is shown in this image? Write a detailed response analyzing the scene.",
                 },
             ],
         },
@@ -175,7 +175,7 @@ async fn main() -> Result<()> {
     };
     let image = image::load_from_memory(&bytes)?;
 
-    let messages = VisionMessages::new().add_vllama_image_message(
+    let messages = VisionMessages::new().add_image_message(
         TextMessageRole::User,
         "What is depicted here? Please describe the scene in detail.",
         image,

diff --git a/examples/python/minicpmo_2_6.py b/examples/python/minicpmo_2_6.py
@@ -0,0 +1,39 @@
+from mistralrs import Runner, Which, ChatCompletionRequest, VisionArchitecture
+
+MODEL_ID = "openbmb/MiniCPM-o-2_6"
+
+runner = Runner(
+    which=Which.VisionPlain(
+        model_id=MODEL_ID,
+        arch=VisionArchitecture.MiniCpmO,
+    ),
+)
+
+res = runner.send_chat_completion_request(
+    ChatCompletionRequest(
+        model="minicpmo_2_6",
+        messages=[
+            {
+                "role": "user",
+                "content": [
+                    {
+                        "type": "image_url",
+                        "image_url": {
+                            "url": "https://www.nhmagazine.com/content/uploads/2019/05/mtwashingtonFranconia-2-19-18-108-Edit-Edit.jpg"
+                        },
+                    },
+                    {
+                        "type": "text",
+                        "text": "(<image>./</image>) What is shown in this image? Write a detailed response analyzing the scene.",
+                    },
+                ],
+            }
+        ],
+        max_tokens=256,
+        presence_penalty=1.0,
+        top_p=0.1,
+        temperature=0.1,
+    )
+)
+print(res.choices[0].message.content)
+print(res.usage)