Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
RahulSChand authored Oct 29, 2023
1 parent d10040d commit 9a0d528
Showing 1 changed file with 12 additions and 5 deletions.
17 changes: 12 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,18 +14,25 @@ Link: **https://rahulschand.github.io/gpu_poor/**
## Features

1. Calculate vRAM memory requirement

<img width="643" alt="image" src="https://github.com/RahulSChand/gpu_poor/assets/16897807/29577394-0efd-42fb-aaf4-282e9a45d5db">

2. Calculate ~token/s you can get

<img width="647" alt="image" src="https://github.com/RahulSChand/gpu_poor/assets/16897807/77627c9b-5fdd-44cf-8b7d-452ff0563a8a">



### Purpose

I made this to check if you can run a particular LLM on your GPU. Useful to figure out the following
1. What quantization I should use to fit any model on my GPU?
2. What max context length my GPU can handle?
3. What kind of finetuning can I do? Full? LoRA? QLoRA?
4. What max batch size I can use during finetuning?
5. What is consuming my GPU memory? What should I change to fit the LLM on my GPU?

1. What quantization will fit on my GPU?
2. Max context length & batch-size my GPU can handle?
3. Which finetuning? Full? LoRA? QLoRA?
5. What is consuming my GPU memory? What to change to fit the LLM on GPU?



The output is the total vRAM & the breakdown of where the vRAM goes (in MB). It looks like below

Expand Down

0 comments on commit 9a0d528

Please sign in to comment.