Skip to content
@makllama

MaKLlama

MaK(Mac+Kubernetes)llama: running and orchestrating large language models (LLMs) on Kubernetes with Mac nodes.

MaKllama Organization

The following video demonstrates the below steps:

  1. Add a Mac node with Apple-Silicon chip to a Kubernetes cluster (in seconds!).
  2. Manually start Bronze Willow (BW) on the Mac node (top-right terminal).
  3. Deploy tinyllama with 2 replicas.
  4. Access the OpenAI API-compatible endpoint through mods.

Demo

Popular repositories Loading

  1. makllama makllama Public

    MaK(Mac+Kubernetes)llama - Running and orchestrating large language models (LLMs) on Kubernetes with macOS nodes.

    Go 33 3

  2. llama.cpp llama.cpp Public

    Forked from ggerganov/llama.cpp

    LLM inference in C/C++

    C++ 3

  3. containerd containerd Public

    Forked from containerd/containerd

    An open and reliable container runtime

    Go 1

  4. cri cri Public

    Forked from virtual-kubelet/cri

    Go 1 1

  5. .github .github Public

  6. ollama ollama Public

    Forked from ollama/ollama

    Get up and running with Llama 3, Mistral, Gemma, and other large language models.

    Go

Repositories

Showing 10 of 18 repositories
  • ollama Public Forked from ollama/ollama

    Get up and running with Llama 3, Mistral, Gemma, and other large language models.

    makllama/ollama’s past year of commit activity
    Go 0 MIT 8,865 0 0 Updated Jan 10, 2025
  • stable-diffusion.cpp Public Forked from leejet/stable-diffusion.cpp

    Stable Diffusion and Flux in pure C/C++

    makllama/stable-diffusion.cpp’s past year of commit activity
    C++ 0 MIT 342 0 0 Updated Dec 27, 2024
  • exo Public Forked from exo-explore/exo

    Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚

    makllama/exo’s past year of commit activity
    Python 0 GPL-3.0 1,045 0 0 Updated Nov 28, 2024
  • llama.cpp Public Forked from ggerganov/llama.cpp

    LLM inference in C/C++

    makllama/llama.cpp’s past year of commit activity
    C++ 3 MIT 10,447 0 0 Updated Nov 26, 2024
  • llama-cpp-python Public Forked from abetlen/llama-cpp-python

    Python bindings for llama.cpp

    makllama/llama-cpp-python’s past year of commit activity
    Python 0 MIT 1,054 0 0 Updated Nov 26, 2024
  • fastfetch Public Forked from gpustack/fastfetch

    Like neofetch, but much faster because written mostly in C.

    makllama/fastfetch’s past year of commit activity
    C 0 MIT 458 0 0 Updated Nov 19, 2024
  • gpustack Public Forked from gpustack/gpustack

    Manage GPU clusters for running LLMs

    makllama/gpustack’s past year of commit activity
    Python 0 Apache-2.0 92 0 0 Updated Nov 17, 2024
  • vllm Public Forked from vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    makllama/vllm’s past year of commit activity
    Python 0 Apache-2.0 5,215 0 0 Updated Oct 16, 2024
  • k8sgpt Public Forked from k8sgpt-ai/k8sgpt

    Giving Kubernetes Superpowers to everyone

    makllama/k8sgpt’s past year of commit activity
    Go 0 Apache-2.0 712 0 0 Updated Sep 24, 2024
  • k8sgpt-operator Public Forked from k8sgpt-ai/k8sgpt-operator

    Automatic SRE Superpowers within your Kubernetes cluster

    makllama/k8sgpt-operator’s past year of commit activity
    Go 0 Apache-2.0 95 0 0 Updated Jul 31, 2024

Top languages

Loading…

Most used topics

Loading…