Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added nvidia gpu instructions #229

Merged
merged 1 commit into from
Apr 10, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions model_servers/llamacpp_python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,18 @@ To pull the base model service image:
podman pull quay.io/ai-lab/llamacpp-python-cuda
```

**IMPORTANT!**

To run the Cuda image with GPU acceleration, you need to install the correct [Cuda drivers](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#driver-installation) for your system along with the [Nvidia Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#). Please use the links provided to find installation instructions for your system.

Once those are installed you can use the container toolkit CLI to discover your Nvidia device(s).
```bash
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
```

Finally, you will also need to add `--device nvidia.com/gpu=all` to your `podman run` command so your container can access the GPU.


### Vulkan (experimental)

The [Vulkan image](../llamacpp_python/vulkan/Containerfile) is experimental, but can be used for gaining partial GPU access on an M-series Mac, significantly speeding up model response time over a CPU only deployment. This image requires that your podman machine provider is "applehv" and that you use krunkit instead of vfkit. Since these tools are not currently supported by podman desktop this image will remain "experimental".
Expand Down Expand Up @@ -100,6 +112,18 @@ podman run --rm -it \
llamacpp_python \
```

or with Cuda image

```bash
podman run --rm -it \
--device nvidia.com/gpu=all
-p 8001:8001 \
-v Local/path/to/locallm/models:/locallm/models:ro \
-e MODEL_PATH=models/mistral-7b-instruct-v0.1.Q4_K_M.gguf
-e HOST=0.0.0.0
-e PORT=8001
llamacpp_python \
```
### Multiple Model Service:

To enable dynamic loading and unloading of different models present on your machine, you can start the model service with a `CONFIG_PATH` instead of a `MODEL_PATH`.
Expand Down