FitMyLLM — Independent benchmarks for self-hosted AI
14d 17h ago by programming.dev/u/anzo in homelab@programming.dev from www.fitmyllm.com
Check what can you use and at what rate of token per seconds would it be... It has examples of many models and quantization levels. Huge resource!
This feels useless. At least for homelabbers, ollama's model page tells us more useful info. And if a newbie goes there they'll be misguided.
Also, there's a lot of people who use CPUs, they don't list anything about them at all. Like I cannot fit Gemma 4 on my GPU, but ollama offloads it to CPU, and even with small GPUs you can get good performance.
And for nearly all small models, it recommends RTX 5060. Which is a very stupid choice.
What do you mean by „small gpu“?
I have not yet tried that, do you have any guidance? Or does „small gpu“ still mean >500€ GPU?
By small, I mean GPUs like outdated ones, laptop GPUs, or like GPUs with only 4GB or 6GB of VRAM.
Interesting, I just have 8GB VRAM unfortunately. So can't run anything particularily useful for mye purpose 😔 The Gemma 4 E4B is quite good, but id like to run the 31B one