Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement MiniCpm-O 2.6 #1074

Merged
merged 22 commits into from
Jan 22, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,10 @@ Please submit requests for new models [here](https://github.com/EricLBuehler/mis
./mistralrs-server -i vision-plain -m lamm-mit/Cephalo-Llama-3.2-11B-Vision-Instruct-128k -a vllama
```

- 🌟📷 Run the **Qwen2-VL** Model: [documentation and guide here](docs/QWEN2VL.md)
- 🌟📷 Run the **MiniCPM-O 2.6** Model: [documentation and guide here](docs/MINICPMO_2_6.md)

```
./mistralrs-server -i vision-plain -m Qwen/Qwen2-VL-2B-Instruct -a qwen2vl
./mistralrs-server -i vision-plain -m openbmb/MiniCPM-o-2_6 -a minicpmo
```

- 🤗📷 Run the **Smol VLM** Model: [documentation and guide here](docs/IDEFICS3.md)
Expand Down Expand Up @@ -171,6 +171,7 @@ https://github.com/EricLBuehler/mistral.rs/assets/65165915/09d9a30f-1e22-4b9a-90
|Idefics 3|✅| |✅|✅|
|DeepseekV2|✅| |✅| |
|DeepseekV3|✅| |✅| |
|MinCPM-O 2.6|✅| |✅| |

## APIs and Integrations

Expand Down Expand Up @@ -439,6 +440,7 @@ If you do not specify the architecture, an attempt will be made to use the model
- `vllama`
- `qwen2vl`
- `idefics3`
- `minicpmo`

### Supported GGUF architectures

Expand Down Expand Up @@ -537,6 +539,8 @@ Please submit more benchmarks via raising an issue!
|Qwen2-VL| | |✅|
|Idefics 3| | |✅|
|Deepseek V2| | |✅|
|Deepseek V3| | |✅|
|MiniCPM-O 2.6| | |✅|

**Device mapping support**
|Model category|Supported|
Expand Down Expand Up @@ -566,6 +570,8 @@ Please submit more benchmarks via raising an issue!
|Qwen2-VL| | | |
|Idefics 3| | | |
|Deepseek V2| | | |
|Deepseek V3| | | |
|MiniCPM-O 2.6| | | |

**AnyMoE support**
|Model|AnyMoE|
Expand All @@ -588,6 +594,8 @@ Please submit more benchmarks via raising an issue!
|Qwen2-VL| |
|Idefics 3|✅|
|Deepseek V2| |
|Deepseek V3| |
|MiniCPM-O 2.6| |


### Using derivative model
Expand Down
235 changes: 235 additions & 0 deletions docs/MINICPMO_2_6.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
# MiniCPM-O 2.6 Model: [`openbmb/MiniCPM-o-2_6`](https://huggingface.co/openbmb/MiniCPM-o-2_6)

Mistral.rs supports the MiniCPM-O 2.6 model, with examples in the Rust, Python, and HTTP APIs. ISQ quantization is supported to allow running the model with less memory requirements.

UQFF quantizations are coming soon.

> [!NOTE]
> Only the vision portion of this model has been implemented. No audio features are supported yet.

The Python and HTTP APIs support sending images as:
- URL
- Path to a local image
- [Base64](https://en.wikipedia.org/wiki/Base64) encoded string

The Rust API takes an image from the [image](https://docs.rs/image/latest/image/index.html) crate.

## ToC
- [Interactive mode](#interactive-mode)
- [HTTP server](#http-server)
- [Rust API](#rust)
- [Python API](#python)

## Interactive mode

Mistral.rs supports interactive mode for vision models! It is an easy way to interact with the model.

1) Start up interactive mode with the MiniCPM-O 2.6 Model model

> [!NOTE]
> You should replace `--features ...` with one of the features specified [here](../README.md#supported-accelerators), or remove it for pure CPU inference.

```
cargo run --features ... --release -- -i --isq Q4K vision-plain -m openbmb/MiniCPM-o-2_6 -a minicpmo
```

2) Say hello!
```
> Hello!
How can I assist you today?
```

3) Pass the model an image and ask a question.

```
> Hello!
How can I assist you today?
> \image https://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Rosa_Precious_platinum.jpg/220px-Rosa_Precious_platinum.jpg What is this image?
The image shows a close-up view of a rose flower with dew drops on its petals. The rose is in full bloom, with its petals unfolding and displaying vibrant pink coloration. The dew drops on the petals create a delicate, glistening effect, adding to the overall visual appeal of the flower. The background is blurred, focusing attention on the intricate details of the rose.
```


## HTTP server
You can find this example [here](../examples/server/minicpmo_2_6.py).

We support an OpenAI compatible HTTP API for vision models. This example demonstrates sending a chat completion request with an image.

> Note: The image_url may be either a path, URL, or a base64 encoded string.

---

**Image:**
<img src="https://www.nhmagazine.com/content/uploads/2019/05/mtwashingtonFranconia-2-19-18-108-Edit-Edit.jpg" alt="Mount Washington" width = "1000" height = "666">
<h6><a href = "https://www.nhmagazine.com/mount-washington/">Credit</a></h6>

**Prompt:**
```
What is shown in this image? Write a detailed response analyzing the scene.
```

**Output:**
```
The image shows Mount Washington, the highest peak in the Northeastern United States, located in the White Mountains of New Hampshire. The scene captures the mountain's rugged terrain and varied landscape features.

In the foreground, there are dense forests of coniferous trees, primarily spruce and fir, which are typical of the region's boreal forest ecosystem. The trees are densely packed, indicating a high level of vegetation cover and biodiversity.

Moving upwards, the image reveals rocky outcroppings and boulders scattered across the slope, indicating the mountain's geological history of glacial activity. The presence of these rocks suggests that the area was once covered by ice sheets during the last ice age, which carved out the landscape and left behind a mix of boulders and talus slopes.

In the mid-ground, the image shows a series of ridges and valleys, which are characteristic of the mountain's glacially sculpted terrain. These features were formed by the movement of ice sheets that carved out U-shaped valleys and left behind a series of rounded hills and ridges.

At the summit, there is a prominent observation tower or weather station, which is likely used for scientific research and weather monitoring. The structure is situated at an elevation of approximately 6,288 feet (1,917 meters) above sea level, making it one of the highest points in the region.

The image also captures the atmospheric conditions on Mount Washington, with clouds and mist visible in the background. The mountain's unique location in a region where cold Arctic air meets warm moist air from the Gulf Stream creates a unique microclimate known as the "Home Rule," where extreme weather conditions can occur.

Overall, the image showcases the diverse geological and ecological features of Mount Washington, highlighting its role as a significant natural landmark in the Northeastern United States.
```

---

1) Start the server

> [!NOTE]
> You should replace `--features ...` with one of the features specified [here](../README.md#supported-accelerators), or remove it for pure CPU inference.

```
cargo run --release --features ... -- --port 1234 --isq Q4K vision-plain -m openbmb/MiniCPM-o-2_6 -a minicpmo
```

2) Send a request

```py
from openai import OpenAI

client = OpenAI(api_key="foobar", base_url="http://localhost:1234/v1/")

completion = client.chat.completions.create(
model="minicpmo_2_6",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://www.nhmagazine.com/content/uploads/2019/05/mtwashingtonFranconia-2-19-18-108-Edit-Edit.jpg"
},
},
{
"type": "text",
"text": "(<image>./</image>) What is shown in this image? Write a detailed response analyzing the scene.",
},
],
},
],
max_tokens=256,
frequency_penalty=1.0,
top_p=0.1,
temperature=0,
)
resp = completion.choices[0].message.content
print(resp)

```

- You can find an example of encoding the [image via base64 here](../examples/server/phi3v_base64.py).
- You can find an example of loading an [image locally here](../examples/server/phi3v_local_img.py).

---

## Rust
You can find this example [here](../mistralrs/examples/minicpmo_2_6/main.rs).

```rust
use anyhow::Result;
use mistralrs::{IsqType, TextMessageRole, VisionLoaderType, VisionMessages, VisionModelBuilder};

const MODEL_ID: &str = "openbmb/MiniCPM-o-2_6";

#[tokio::main]
async fn main() -> Result<()> {
let model =
VisionModelBuilder::new(MODEL_ID, VisionLoaderType::VLlama)
.with_isq(IsqType::Q4K)
.with_logging()
.build()
.await?;

let bytes = match reqwest::blocking::get(
"https://cdn.britannica.com/45/5645-050-B9EC0205/head-treasure-flower-disk-flowers-inflorescence-ray.jpg",
) {
Ok(http_resp) => http_resp.bytes()?.to_vec(),
Err(e) => anyhow::bail!(e),
};
let image = image::load_from_memory(&bytes)?;

let messages = VisionMessages::new().add_image_message(
TextMessageRole::User,
"What is depicted here? Please describe the scene in detail.",
image,
);

let response = model.send_chat_request(messages).await?;

println!("{}", response.choices[0].message.content.as_ref().unwrap());
dbg!(
response.usage.avg_prompt_tok_per_sec,
response.usage.avg_compl_tok_per_sec
);

Ok(())
}
```

---

## Python
You can find this example [here](../examples/python/minicpmo_2_6.py).

This example demonstrates loading and sending a chat completion request with an image.

> Note: the image_url may be either a path, URL, or a base64 encoded string.

```py
from mistralrs import Runner, Which, ChatCompletionRequest, VisionArchitecture

MODEL_ID = "openbmb/MiniCPM-o-2_6"

runner = Runner(
which=Which.VisionPlain(
model_id=MODEL_ID,
arch=VisionArchitecture.MiniCpmO,
),
)

res = runner.send_chat_completion_request(
ChatCompletionRequest(
model="minicpmo_2_6",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://www.nhmagazine.com/content/uploads/2019/05/mtwashingtonFranconia-2-19-18-108-Edit-Edit.jpg"
},
},
{
"type": "text",
"text": "(<image>./</image>) What is shown in this image? Write a detailed response analyzing the scene.",
},
],
}
],
max_tokens=256,
presence_penalty=1.0,
top_p=0.1,
temperature=0.1,
)
)
print(res.choices[0].message.content)
print(res.usage)
```

- You can find an example of encoding the [image via base64 here](../examples/python/phi3v_base64.py).
- You can find an example of loading an [image locally here](../examples/python/phi3v_local_img.py).
3 changes: 3 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@
- [Llama 3.2 Vision](VLLAMA.md)
- [Qwen2-VL](QWEN2VL.md)
- [Idefics 3 and Smol VLM](IDEFICS3.md)
- [DeepSeek V2](DEEPSEEKV2.md)
- [DeepSeek V3](DEEPSEEKV3.md)
- [MiniCPM-O 2.6](MINICPMO_2_6.md)

## Adapters
- [Docs](ADAPTER_MODELS.md)
Expand Down
4 changes: 2 additions & 2 deletions docs/VLLAMA.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ completion = client.chat.completions.create(
},
{
"type": "text",
"text": "What is shown in this image? Write a detailed response analyzing the scene.",
"text": "<|image|>What is shown in this image? Write a detailed response analyzing the scene.",
},
],
},
Expand Down Expand Up @@ -175,7 +175,7 @@ async fn main() -> Result<()> {
};
let image = image::load_from_memory(&bytes)?;

let messages = VisionMessages::new().add_vllama_image_message(
let messages = VisionMessages::new().add_image_message(
TextMessageRole::User,
"What is depicted here? Please describe the scene in detail.",
image,
Expand Down
39 changes: 39 additions & 0 deletions examples/python/minicpmo_2_6.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
from mistralrs import Runner, Which, ChatCompletionRequest, VisionArchitecture

MODEL_ID = "openbmb/MiniCPM-o-2_6"

runner = Runner(
which=Which.VisionPlain(
model_id=MODEL_ID,
arch=VisionArchitecture.MiniCpmO,
),
)

res = runner.send_chat_completion_request(
ChatCompletionRequest(
model="minicpmo_2_6",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://www.nhmagazine.com/content/uploads/2019/05/mtwashingtonFranconia-2-19-18-108-Edit-Edit.jpg"
},
},
{
"type": "text",
"text": "(<image>./</image>) What is shown in this image? Write a detailed response analyzing the scene.",
},
],
}
],
max_tokens=256,
presence_penalty=1.0,
top_p=0.1,
temperature=0.1,
)
)
print(res.choices[0].message.content)
print(res.usage)
Loading
Loading