8
7

Automatic slur alert

5d 15h ago by multiverse.soulism.net/u/Grail in piefed_help@piefed.social

Hey piefed people. I'm an admin and I'd like to receive an automatic report notification when certain phrases (namely, slurs) federate to our website. Is there a setting to do that? I'm looking at the moderation tab of the admin panel and it's just a list of reports. I don't actually get notified about most reports, which is a problem too. I'd like it to work like the em-dash AI reports that come in.

Yeah, I think a bit more automated reports and a bit of nuance in how to deal with things would be helpful here. It's very rare, but I even had (once or twice, and not here) conversations talking about slurs on the level you'd think - should be removed automatically. But it was a legitimate conversation and you always end up in a situation like the YouTube creators where everything needs to be paraphrased in some silly family-friendly way. On the flipside those words, used by a new account are very accurate in detecting annoying trolls. It just sucks once it's done 100% automatically and nobody can talk about violence or racism or the past any more.

Yeah slurs are useful for mods because they help us ban bigots. If someone keeps saying slurs, I don't wanna automatically remove it, I wanna manually ban them.

If you'd like to automatically drop posts and comments that contain slurs, there's a field for that in /admin/federation

I don't want to do that, because I'm concerned about the Scunthorpe problem. I want to manually review it every time and ban the repeat offenders from federating any comments to this instance.

It's always worth thinking about ways a feature can be abused, because if it can be, it will, eventually. And this feature has surveillance vibes, which leads me to these kinds of scenarios:

  • Set my name as a slur, then I get alerted whenever someone says bad things about me and I can jump in and defend myself. So this feature is encouraging hyper-vigilance and the perpetuation of arguments that could just fizzle out if left alone.
  • Set the name of someone I hate as a slur, then I get alerted whenever someone mentions them and I can jump in with my copypasta of slanderous bullshit about them.
  • Admin attention DDOS - post content that deliberately triggers the notifications (but only the keywrods you have set), to waste your time or summon you to places they want you to see.

This system doesn't just warn admins about toxicity, it actively draws their attention to (potential) toxicity and expects a response from them. So potentially we're warping their perception of situations ("this community is so toxic I get so many notifs about slurs in it") and putting more work and responsibility on them.

But an automated system can quietly fail and nobody would know. For example, I want to set the word n*rccissist as a slur. But there's a Scunthorpe problem: If the tool scans substrings, it would automatically remove any discussion of narcissistic personality disorder, which has the slur as a substring. And I definitely don't want to ban all discussion of the disorder. I only want to stop people from using the disorder as an insult they can throw around like candy at politicians they don't like.

I'm going off on bit of a tangent here.

One of the awkward/awesome things about the fediverse is that it's a commons and yet it is not. As an admin of a small instance, when someone throws around slurs or crosses one of your other red lines 99% of the time it's going to be on a different instance than yours and probably not involving any of your local users. Yet by keeping your local copy of the offending post you're exposing your people to it and perpetuating the harm it represents so there is a pressure to do something about it.

As admins we want to be able to provide a unique and well-run space to the people we provide our instance to and that implies a certain amount of being an extra layer of moderation on top of that done by other instances and their community mods. But it's really not sustainable to do that for the entire fediverse so what I've tried to do with PieFed is provide admins with the tools to say "these are the communities I consider to be well run and which I recommend to you, go ahead and enjoy them". That happens by the choice of Topics/Feeds that the admin creates and which communities they put into them. They might put warnings on communities where trouble occurs, or warnings on post links. They might also silence some instances if they're bit of a grey area or defederate instances which are completely 'not right for us'. And so on. All of these are the kind of job that needs effort and attention once and then the benefits continue forever. This drains a lot less energy than reactive moderation actions.

Basically, rather than finding new and creative ways to stomp on people and divide them, I'd rather find creative ways to stay connected yet apart enough to avoid triggering each other. We're not there yet - I still feel the need to ban people more often than I'd like - but that's the general direction.