You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cd /workspace
git clone https://github.com/ilkersigirci/runpod-playground.git
cd /workspace/runpod-playground
# Prepare .env file
make prepare-env-file
# Initial dependency install
make initial-runpod-install
# Download model
make download-model
# Start vllm
make start-vllm
# See vllm logs
make log-vllm
# Restart vllm
make restart-vllm
# Start the simple gui
make gui
Api healthcheck is enabled by default, which sends a message to the vllm server in fixed period of time.
To disable healthcheck, ENABLE_HEALTH_CHECK=0 should be set in .env file.
To send the healthcheck failure message to Microsoft Teams, TEAMS_WEBHOOK_URL should be set in .env file.
To deploy different model, in .env file, change HF_MODEL_NAME variable to the model name you want to deploy by following hunggingface repository id convention.
Also you can change SERVED_MODEL_NAME to specify model name for requests.
One can also change MAX_CONTEXT_LEN variable to the desired context length.
Example: Change default model and its context length to CohereForAI/c4ai-command-r-plus-GPTQ
make replace-value-in-env-file variable_name=HF_MODEL_NAME new_value=CohereForAI/c4ai-command-r-plus-GPTQ
make replace-value-in-env-file variable_name=MAX_CONTEXT_LEN new_value=40000
cURL Examples
Request with system message assuming SERVED_MODEL_NAME=vLLM-Model
curl --request POST \
--url http://0.0.0.0:8000/v1/chat/completions \
--header "Content-Type: application/json" \
--data '{ "model": "vLLM-Model", "messages": [ { "role": "system", "content": "You are a helpful virtual assistant trained by OpenAI." }, { "role": "user", "content": "Who are you?" } ], "temperature": 0.8, "stream": false}'
Request without system message assuming SERVED_MODEL_NAME=vLLM-Model