AOS for Lemmy.World - A generic Lemmy server for everyone to use.

LocalLLaMA

Opencode llama-server prefill/generation stats plugin

162

3d 7h ago by fedia.io/u/troed in localllama@sh.itjust.works from codeberg.org

I've just published an Opencode plugin I made since I couldn't find one that fulfilled the exact use case I had myself. Publishing and posting in case it's useful for someone else too.

When working with local models I have the need to know why there's suddenly nothing happening (besides the blue cylon-bar) but the plugins I found only showed token generation data when something was passed into Opencode. That meant that during >1 minute prefills there was no output at all.

This plugin uses the /slots endpoint (enabled by default) in llama-server to deduce whether it's currently generating tokens or doing prompt processing, and also the current tps for that activity. Now I can just run llama-server as a daemon and I no longer feel the need to go inspect its output just to see what's up.

It's likely only useful in a single-user scenario, but it has been tested with both single and multiple parallel slots.

Installation:

opencode plugin @troed/oc-ls-stats@latest --global

Yesterday I needed this. Will install this. Thanks.

May I ask: have you noticed if the prompt processing speeds shown in llama-bench are vastly different from llama-server ? I have hundreds of tokens of difference.

Qwen 3.6 27B running at 46 tok/s on an RX 9070 XT (llama.cpp + MTP Speculative Decoding is basically magic)

3d 18h ago by ani.social/u/cicadagen in localllama@sh.itjust.works

61172

Oops

4d 11h ago by aussie.zone/u/Eyekaytee in localllama@sh.itjust.works from files.ikt.id.au

90123

North Mini Code v1.0 - a Qwen 3.6 35B MoE alternative

7d 3h ago by fedia.io/u/troed in localllama@sh.itjust.works from huggingface.co

1344

My models don't have reasoning ability in llama-b9543 server but have in llama-cli

9d 22h ago by thelemmy.club/u/Schilling2304 in localllama@sh.itjust.works

725

I Put a Datacenter GPU in My Gaming PC for £200

10d 13h ago by lemy.lol/u/HelloRoot in localllama@sh.itjust.works from blog.tymscar.com

109156

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

11d 22h ago by mbin.potato-guy.space/u/potatoguy in localllama@sh.itjust.works from blog.google

1807

Gemma4 12b released with "unified" approach to multi-modality

13d 4h ago by lemmy.ml/u/robber in localllama@sh.itjust.works from huggingface.co

20138

I Tried This Open Source ChatGPT Alternative [Jan AI] on Linux, But Went Back to Ollama

15d 3h ago by lemy.lol/u/cm0002 in localllama@sh.itjust.works from itsfoss.com

1499

Don't skimp on the quant when using MoE

19d 3h ago by fedia.io/u/troed in localllama@sh.itjust.works from unsloth.ai

30310

Infinity-Parser2 - Multimodal Document Parser

19d 20h ago by sh.itjust.works/u/pepperfree in localllama@sh.itjust.works from huggingface.co

7011

Your best local LLM for low-VRAM (6GB)?

26d 6h ago by feddit.org/u/sp3ctre in localllama@sh.itjust.works

301412

DystopiaBench - AI Ethics Stress Test

29d 5h ago by aussie.zone/u/Eyekaytee in localllama@sh.itjust.works from dystopiabench.com

61513

Claude? No. Cucumbers? Yes!

29d 10h ago by aussie.zone/u/SuspiciousCarrot78 in localllama@sh.itjust.works

15314

Llama.cpp MTP Support merged - up to 2.5x speed increase

1mon 1d ago by piefed.zip/u/TheCornCollector in localllama@sh.itjust.works from github.com

44315

Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution

1mon 2d ago by mander.xyz/u/BB84 in localllama@sh.itjust.works from github.com

10416

"The cost of running LLMs is just too damn high"

1mon 2d ago by aussie.zone/u/SuspiciousCarrot78 in localllama@sh.itjust.works

38917

Token Speed visualiser

1mon 3d ago by aussie.zone/u/SuspiciousCarrot78 in localllama@sh.itjust.works from mikeveerman.github.io

9018

<8B multilingual models for language learning chatbots

1mon 4d ago by piefed.social/u/XiELEd in localllama@sh.itjust.works

9519

llama.cpp Multi-Model Server Architecture: ASUS Zenbook UM3504DA

1mon 5d ago by lemmy.zip/u/variety4me in localllama@sh.itjust.works

15920

Gemma4 with MTP was released

1mon 12d ago by jlai.lu/u/Mubelotix in localllama@sh.itjust.works from jlai.lu

3121

Good translation models which fit on a smartphone?

1mon 12d ago by piefed.jeena.net/u/jeena in localllama@sh.itjust.works

211022

AI-Editor in LibreOffice Writer?

1mon 16d ago by mander.xyz/u/tristynalxander in localllama@sh.itjust.works

17523

a little locallama game theory ...game

1mon 18d ago by aussie.zone/u/Eyekaytee in localllama@sh.itjust.works

9524

Mistral Medium 3.5 released

1mon 18d ago by aussie.zone/u/Eyekaytee in localllama@sh.itjust.works from mistral.ai

19125

Is there any good general AI-Agent /workflow platform which isn't vibe-coded?

1mon 19d ago by palaver.p3x.de/u/hendrik in localllama@sh.itjust.works

20826

would you laugh at me if I ran gemma-4-26b on a 4 core Xeon, with 32GB RAM, no GPU?

1mon 19d ago by lemmy.zip/u/variety4me in localllama@sh.itjust.works from codeberg.org

29327

llama.cpp: don't sleep on --split-mode tensor

1mon 20d ago by lemmy.ml/u/robber in localllama@sh.itjust.works from github.com

19028

Noob here: Why is Google making Gemma open-source?

1mon 21d ago by sh.itjust.works/u/Yerbouti in localllama@sh.itjust.works

152429

Which open models are actually good at agentic coding?

1mon 21d ago by lemmy.dbzer0.com/u/hok in localllama@sh.itjust.works

21930

BullshitBench Viewer - BullshitBench measures whether AI models challenge nonsensical prompts instead of confidently answering them, created by Peter Gostev.

1mon 21d ago by aussie.zone/u/Eyekaytee in localllama@sh.itjust.works from petergpt.github.io

12231

Intel B70: LLama.cpp SYCL vs LLama.cpp OpenVino vs LLM-Scaler

1mon 21d ago by lemmy.world/u/Fmstrat in localllama@sh.itjust.works

11032

I ran Gemma 26B on 4GB VRAM + 16RAM. 15 t/s on avarage

1mon 24d ago by lemmy.world/u/NAwT in localllama@sh.itjust.works

602033

DeepSeek-V4 Pro (1.6T-A49) and Flash (284B-A13)

1mon 24d ago by piefed.zip/u/TheCornCollector in localllama@sh.itjust.works from huggingface.co

26034

Qwen3.6 27B released

1mon 26d ago by piefed.zip/u/TheCornCollector in localllama@sh.itjust.works from huggingface.co

642335

Qwen3.6 finally makes my Local LlaMa useful

1mon 26d ago by discuss.tchncs.de/u/Bob_Robertson_IX in localllama@sh.itjust.works

682736

Kimi K2.6: Advancing Open-Source Coding

1mon 28d ago by lemmy.ml/u/morrowind in localllama@sh.itjust.works from www.kimi.com

18037

My experience with local LLM

1mon 29d ago by lemmy.ml/u/ntn888 in localllama@sh.itjust.works

481938

Anyone's using Intel Arc B70 Pro?

2mon 1d ago by lemmy.dbzer0.com/u/pound_heap in localllama@sh.itjust.works from lemmy.dbzer0.com

582039

Qwen3.6-35B-A3B released

2mon 2d ago by piefed.zip/u/TheCornCollector in localllama@sh.itjust.works from huggingface.co

432140

In search for a new self-hosted LLM

2mon 7d ago by lemmy.ml/u/tanka in localllama@sh.itjust.works

21641

Launch of ARC-AGI-3 - next edition of a benchmark for agents

2mon 9d ago by lemmy.ml/u/vermaterc in localllama@sh.itjust.works from www.youtube.com

7042

VOID: Video Object and Interaction Deletion [by Netflix]

2mon 14d ago by lemmy.world/u/General_Effort in localllama@sh.itjust.works from void-model.github.io

10043

Gemma 4 is here

2mon 15d ago by lemmy.ml/u/robber in localllama@sh.itjust.works from huggingface.co

19244

Google releases Gemma 4 open models

2mon 15d ago by aussie.zone/u/Eyekaytee in localllama@sh.itjust.works from ai.google.dev

34945

Claude Code Frontend Accidentally Leaked by Anthropic

2mon 17d ago by sh.itjust.works/u/Canuck in localllama@sh.itjust.works from x.com

47846

[Technical] The Great Silicon Shortage

2mon 19d ago by aussie.zone/u/Eyekaytee in localllama@sh.itjust.works from newsletter.semianalysis.com

-2047

KittenTTS v0.8: three new models, smallest under 25 MB

2mon 26m ago by lemdro.id/u/mudkip in localllama@sh.itjust.works from github.com

46248

Introducing Mistral Small 4 | Mistral AI

3mon 2d ago by aussie.zone/u/Eyekaytee in localllama@sh.itjust.works from mistral.ai

33849

LLM Architecture Gallery

3mon 3d ago by lemdro.id/u/mudkip in localllama@sh.itjust.works from sebastianraschka.com

20050