From 9a0d52880c1f91501bc6e327eaa58c252f418aff Mon Sep 17 00:00:00 2001 From: Rahul Shiv Chand Date: Sun, 29 Oct 2023 07:06:12 +0530 Subject: [PATCH] Update README.md --- README.md | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index b5ce003..8bc8cf6 100644 --- a/README.md +++ b/README.md @@ -14,18 +14,25 @@ Link: **https://rahulschand.github.io/gpu_poor/** ## Features 1. Calculate vRAM memory requirement + +image + 2. Calculate ~token/s you can get +image + ### Purpose I made this to check if you can run a particular LLM on your GPU. Useful to figure out the following -1. What quantization I should use to fit any model on my GPU? -2. What max context length my GPU can handle? -3. What kind of finetuning can I do? Full? LoRA? QLoRA? -4. What max batch size I can use during finetuning? -5. What is consuming my GPU memory? What should I change to fit the LLM on my GPU? + +1. What quantization will fit on my GPU? +2. Max context length & batch-size my GPU can handle? +3. Which finetuning? Full? LoRA? QLoRA? +5. What is consuming my GPU memory? What to change to fit the LLM on GPU? + + The output is the total vRAM & the breakdown of where the vRAM goes (in MB). It looks like below