
I've just published an Opencode plugin I made since I couldn't find one that fulfilled the exact use case I had myself. Publishing and posting in case it's useful for someone else too.
When working with local models I have the need to know why there's suddenly nothing happening (besides the blue cylon-bar) but the plugins I found only showed token generation data when something was passed into Opencode. That meant that during >1 minute prefills there was no output at all.
This plugin uses the /slots endpoint (enabled by default) in llama-server to deduce whether it's currently generating tokens or doing prompt processing, and also the current tps for that activity. Now I can just run llama-server as a daemon and I no longer feel the need to go inspect its output just to see what's up.
It's likely only useful in a single-user scenario, but it has been tested with both single and multiple parallel slots.
Installation:
opencode plugin @troed/oc-ls-stats@latest --global
Yesterday I needed this. Will install this. Thanks.
May I ask: have you noticed if the prompt processing speeds shown in llama-bench are vastly different from llama-server ? I have hundreds of tokens of difference.
Qwen 3.6 27B running at 46 tok/s on an RX 9070 XT (llama.cpp + MTP Speculative Decoding is basically magic)
3d 18h ago by ani.social/u/cicadagen in localllama@sh.itjust.worksNorth Mini Code v1.0 - a Qwen 3.6 35B MoE alternative
7d 3h ago by fedia.io/u/troed in localllama@sh.itjust.works from huggingface.coMy models don't have reasoning ability in llama-b9543 server but have in llama-cli
9d 22h ago by thelemmy.club/u/Schilling2304 in localllama@sh.itjust.worksI Put a Datacenter GPU in My Gaming PC for £200
10d 13h ago by lemy.lol/u/HelloRoot in localllama@sh.itjust.works from blog.tymscar.comGemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency
11d 22h ago by mbin.potato-guy.space/u/potatoguy in localllama@sh.itjust.works from blog.googleGemma4 12b released with "unified" approach to multi-modality
13d 4h ago by lemmy.ml/u/robber in localllama@sh.itjust.works from huggingface.coI Tried This Open Source ChatGPT Alternative [Jan AI] on Linux, But Went Back to Ollama
15d 3h ago by lemy.lol/u/cm0002 in localllama@sh.itjust.works from itsfoss.comDon't skimp on the quant when using MoE
19d 3h ago by fedia.io/u/troed in localllama@sh.itjust.works from unsloth.aiInfinity-Parser2 - Multimodal Document Parser
19d 20h ago by sh.itjust.works/u/pepperfree in localllama@sh.itjust.works from huggingface.coYour best local LLM for low-VRAM (6GB)?
26d 6h ago by feddit.org/u/sp3ctre in localllama@sh.itjust.worksDystopiaBench - AI Ethics Stress Test
29d 5h ago by aussie.zone/u/Eyekaytee in localllama@sh.itjust.works from dystopiabench.comClaude? No. Cucumbers? Yes!
29d 10h ago by aussie.zone/u/SuspiciousCarrot78 in localllama@sh.itjust.worksLlama.cpp MTP Support merged - up to 2.5x speed increase
1mon 1d ago by piefed.zip/u/TheCornCollector in localllama@sh.itjust.works from github.comOrthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution
1mon 2d ago by mander.xyz/u/BB84 in localllama@sh.itjust.works from github.com"The cost of running LLMs is just too damn high"
1mon 2d ago by aussie.zone/u/SuspiciousCarrot78 in localllama@sh.itjust.worksToken Speed visualiser
1mon 3d ago by aussie.zone/u/SuspiciousCarrot78 in localllama@sh.itjust.works from mikeveerman.github.io<8B multilingual models for language learning chatbots
1mon 4d ago by piefed.social/u/XiELEd in localllama@sh.itjust.worksllama.cpp Multi-Model Server Architecture: ASUS Zenbook UM3504DA
1mon 5d ago by lemmy.zip/u/variety4me in localllama@sh.itjust.worksGemma4 with MTP was released
1mon 12d ago by jlai.lu/u/Mubelotix in localllama@sh.itjust.works from jlai.luGood translation models which fit on a smartphone?
1mon 12d ago by piefed.jeena.net/u/jeena in localllama@sh.itjust.worksAI-Editor in LibreOffice Writer?
1mon 16d ago by mander.xyz/u/tristynalxander in localllama@sh.itjust.worksa little locallama game theory ...game
1mon 18d ago by aussie.zone/u/Eyekaytee in localllama@sh.itjust.worksMistral Medium 3.5 released
1mon 18d ago by aussie.zone/u/Eyekaytee in localllama@sh.itjust.works from mistral.aiIs there any good general AI-Agent /workflow platform which isn't vibe-coded?
1mon 19d ago by palaver.p3x.de/u/hendrik in localllama@sh.itjust.workswould you laugh at me if I ran gemma-4-26b on a 4 core Xeon, with 32GB RAM, no GPU?
1mon 19d ago by lemmy.zip/u/variety4me in localllama@sh.itjust.works from codeberg.orgllama.cpp: don't sleep on --split-mode tensor
1mon 20d ago by lemmy.ml/u/robber in localllama@sh.itjust.works from github.comNoob here: Why is Google making Gemma open-source?
1mon 21d ago by sh.itjust.works/u/Yerbouti in localllama@sh.itjust.worksWhich open models are actually good at agentic coding?
1mon 21d ago by lemmy.dbzer0.com/u/hok in localllama@sh.itjust.worksBullshitBench Viewer - BullshitBench measures whether AI models challenge nonsensical prompts instead of confidently answering them, created by Peter Gostev.
1mon 21d ago by aussie.zone/u/Eyekaytee in localllama@sh.itjust.works from petergpt.github.ioIntel B70: LLama.cpp SYCL vs LLama.cpp OpenVino vs LLM-Scaler
1mon 21d ago by lemmy.world/u/Fmstrat in localllama@sh.itjust.worksI ran Gemma 26B on 4GB VRAM + 16RAM. 15 t/s on avarage
1mon 24d ago by lemmy.world/u/NAwT in localllama@sh.itjust.worksDeepSeek-V4 Pro (1.6T-A49) and Flash (284B-A13)
1mon 24d ago by piefed.zip/u/TheCornCollector in localllama@sh.itjust.works from huggingface.coQwen3.6 27B released
1mon 26d ago by piefed.zip/u/TheCornCollector in localllama@sh.itjust.works from huggingface.coQwen3.6 finally makes my Local LlaMa useful
1mon 26d ago by discuss.tchncs.de/u/Bob_Robertson_IX in localllama@sh.itjust.worksKimi K2.6: Advancing Open-Source Coding
1mon 28d ago by lemmy.ml/u/morrowind in localllama@sh.itjust.works from www.kimi.comMy experience with local LLM
1mon 29d ago by lemmy.ml/u/ntn888 in localllama@sh.itjust.worksAnyone's using Intel Arc B70 Pro?
2mon 1d ago by lemmy.dbzer0.com/u/pound_heap in localllama@sh.itjust.works from lemmy.dbzer0.comQwen3.6-35B-A3B released
2mon 2d ago by piefed.zip/u/TheCornCollector in localllama@sh.itjust.works from huggingface.coIn search for a new self-hosted LLM
2mon 7d ago by lemmy.ml/u/tanka in localllama@sh.itjust.worksLaunch of ARC-AGI-3 - next edition of a benchmark for agents
2mon 9d ago by lemmy.ml/u/vermaterc in localllama@sh.itjust.works from www.youtube.comVOID: Video Object and Interaction Deletion [by Netflix]
2mon 14d ago by lemmy.world/u/General_Effort in localllama@sh.itjust.works from void-model.github.ioGemma 4 is here
2mon 15d ago by lemmy.ml/u/robber in localllama@sh.itjust.works from huggingface.coGoogle releases Gemma 4 open models
2mon 15d ago by aussie.zone/u/Eyekaytee in localllama@sh.itjust.works from ai.google.devClaude Code Frontend Accidentally Leaked by Anthropic
2mon 17d ago by sh.itjust.works/u/Canuck in localllama@sh.itjust.works from x.com[Technical] The Great Silicon Shortage
2mon 19d ago by aussie.zone/u/Eyekaytee in localllama@sh.itjust.works from newsletter.semianalysis.comKittenTTS v0.8: three new models, smallest under 25 MB
2mon 26m ago by lemdro.id/u/mudkip in localllama@sh.itjust.works from github.comIntroducing Mistral Small 4 | Mistral AI
3mon 2d ago by aussie.zone/u/Eyekaytee in localllama@sh.itjust.works from mistral.aiLLM Architecture Gallery
3mon 3d ago by lemdro.id/u/mudkip in localllama@sh.itjust.works from sebastianraschka.com























