Skip to content

Latest commit

 

History

History
23 lines (13 loc) · 627 Bytes

README.md

File metadata and controls

23 lines (13 loc) · 627 Bytes

This is placeholder for a proper readme

GenAI.Server

OnnxRuntime.GenAI based OpenAI Api compatible Server Toy project, not production ready, use it with a grain of salt.

My toy project for hosting and serving OnnxRuntime.GenAI models, experimenting with OnnxRuntime.GenAI.

Includes a Jinja port from HF Jinja JS so the chat prompt template is taken from tokenizer.config

Includes a minimal API implementation with /models and chat/completions endpoints, not auth yet

If this sparks some interest - I may improve it, documentation and functionality wise.

Buid it

docker-compose build
docker-compose up