BullshitBench Viewer - BullshitBench measures whether AI models challenge nonsensical prompts instead of confidently answering them, created by Peter Gostev. - AOS for Lemmy.World - A generic Lemmy server for everyone to use.

122

BullshitBench Viewer - BullshitBench measures whether AI models challenge nonsensical prompts instead of confidently answering them, created by Peter Gostev.

1mon 21d ago by aussie.zone/u/Eyekaytee in localllama@sh.itjust.works from petergpt.github.io

https://github.com/petergpt/bullshit-benchmark

A very necessary benchmark

This is cool!

it doesn’t seem fair to draw a line of Google models and combine Gemma and Gemini. They’re two very different models.
for active parameters it’s interesting that is basically flat, but it does look like MOE are better

Benchmark that goes open source just becomes a part of training data :-)