BullshitBench Viewer - BullshitBench measures whether AI models challenge nonsensical prompts instead of confidently answering them, created by Peter Gostev.
1mon 21d ago by aussie.zone/u/Eyekaytee in localllama@sh.itjust.works from petergpt.github.iohttps://github.com/petergpt/bullshit-benchmark
A very necessary benchmark
This is cool!
- it doesn’t seem fair to draw a line of Google models and combine Gemma and Gemini. They’re two very different models.
- for active parameters it’s interesting that is basically flat, but it does look like MOE are better
Benchmark that goes open source just becomes a part of training data :-)