12
2

BullshitBench Viewer - BullshitBench measures whether AI models challenge nonsensical prompts instead of confidently answering them, created by Peter Gostev.

1mon 21d ago by aussie.zone/u/Eyekaytee in localllama@sh.itjust.works from petergpt.github.io

This is cool!

  • it doesn’t seem fair to draw a line of Google models and combine Gemma and Gemini. They’re two very different models.
  • for active parameters it’s interesting that is basically flat, but it does look like MOE are better

Benchmark that goes open source just becomes a part of training data :-)