This is placeholder for a proper readme
OnnxRuntime.GenAI based OpenAI Api compatible Server Toy project, not production ready, use it with a grain of salt.
My toy project for hosting and serving OnnxRuntime.GenAI models, experimenting with OnnxRuntime.GenAI.
Includes a Jinja port from HF Jinja JS so the chat prompt template is taken from tokenizer.config
Includes a minimal API implementation with /models and chat/completions endpoints, not auth yet
If this sparks some interest - I may improve it, documentation and functionality wise.
Buid it
docker-compose build
docker-compose up