North Mini Code v1.0 - a Qwen 3.6 35B MoE alternative
7d 5h ago by fedia.io/u/troed in localllama@sh.itjust.works from huggingface.co
Since I like having more than one local LLM to switch between when analysing tricky development issues I decided to try out this new MoE model today. It's a 30B A3B which means it's basically a drop-in replacement for Qwen 3.6 35B A3B with suitable llama.cpp parameters the same.
On their own published benchmark metrics it's supposed to be slightly worse than Qwen, but so far it's not something I've noticed. It's tuned to work well in Opencode which is how I'm running it as well.
Try it out, see how it works for you. I know that there are those who would rather use a Canadian than Chinese model in today's political climate and it does seem to perform better than Gemma 4 at least for me. Just don't forget to use the PR linked from unsloth's description until it has been merged into main.
Interesting. Looks like I'd need to build a special llama.cpp to get it to run on my system currently, and I think I could get lost for a long time if I start digging up that rabbit hole... so maybe not today, but I'll keep an eye out and give it a try if support lands in main.
Is it doing any better than Qwen at avoiding getting stuck in thinking loops?
This Dockerfile worked for me to build the llama-cpp-turboquant fork: https://huggingface.co/spaces/ai-engineering-at/llama-cpp-turboquant-guide/blob/main/Dockerfile.Should work for upstream too. The Dockerfile I made myself crashed 2 different machines, but then I found this one and can confirm it works well.
I've got an AMD system so that probably won't work for me, but glad it's working for you and maybe it will help others!
How does the model compare to Qwen and Gemma4 so far?
Ah, ok, hope it helps someone. I’ll probably try the model this weekend sometime.