22
18

For the Unaware, this instance is under a ddos-like bot attack

18d 23h ago by feddit.online/u/FiniteBanjo in feddit_online@feddit.online

AI-Training Data Scrapers are constantly attacking the feddit.online instance, in the last 24 hours about 70% of all traffic here was a bot of nefarious intent.

This is a continuation of the problems experienced last weekend which were temporarily snubbed by CloudFlare filtering them out, but they always return eventually.

This is the reason many users have been complaining about poor uptime/stability.

Thanks for understanding. I think the server is functioning normally now. Please let me know if it isn't for you.

It all began a few days ago. It was strange, but all the AI bots started a feeding frenzy on the server. The worst, by far, was coming out of Vietnam. It accounted for at least 60% of the AI traffic. Then Bangladesh took another 25%. In 3rd place was Claude. There was Tencent, Amazon, Meta, GPT, Baidu, Yandex, and more. All the bots you know well were scraping, except Google. It was like a shark feeding frenzy. They all ignored the robots.txt file (except Google).

The server became unresponsive.

I think I've got them all blocked now at the Cloudflare edge. It appears that 99% of the traffic I'm seeing is real. But I'll keep monitoring. They constantly change how they access servers to break through, so I'm watching.

In the past 24 hours, the firewall has blocked over 700,000 AI bot requests.

I read that Lemmy/PieFed/Mbin are now the darling targets for training AI. Not for knowledge but to learn how to write like a human. So they are all out to read the entire databases of Threadeverse servers.

Feddit has over 2 years of just about every single post and comment from every Threadiverse server in a 130 GB database, so we're apparently a juicy target.

If they'd really want to train on Fediverse data sets, why don't they just set-up an ActivityPub compatible endpoint and subscribe to a fuck ton of communities, users, hashtags, ... And just get it delivered to them for free?!

AI morons ... do something intelligent!? ... I think you forget who we're talking about here.

Hopefully the AI-friendly instances allow the feeding and end up poisoning the datasets lol

If I had the time, knowledge, & hardware, I'd be setting up honeypots to accomplish exactly that, rotating through new names every so often as they figured out the truth of the current one.

I remember hearing about one type of AI defence where the page has some invisible redirects at the top which human users won't see but bots will and that takes the crawlers into an infinite maze of constantly randomly generated redirects.

What you described is that but more nefarious because it would take the AI owners longer to notice they're just wasting compute time lol.

Any mobile app I try is giving a 403 error right now, and has since at least last night. The main website works though.

Uggh. The firewall is blocking you. Sorry about this.

Are you trying to log in? Or are you already logged in and can no longer browse? Would you be willing to share your IP address with me privately via email at jerry@feddit.online?

UPDATE I made a change to a firewall rule based on my investigation into how some phone apps operate. Does it work now?

Seems like it's still not working. I was already logged in, but I tried logging out and then back in. Login works, but loading posts does not. With Boost it just says 403, but I tried Summit and it shows this: Client error. Code: 403. Message: Rate limit timed out..

I'll email you my IP. I'm happy to help troubleshoot anything with this. I'm a sysadmin by trade, and I'm happy to help out in any way I can, so feel free to throw any testing you'd like at me and I'll assist.

Thank you!

Does your IP address fit xxx.154.xxx.xxx ? It's one in common in the Cloudflare logs that came in from both Summit and Boost in the past 24 hours and shows that a challenge was issued by Cloudflare back to the app in response to the request, a challenge your phone app could not perform.

What I think then happened is that Boost assumed the server was broken because of the unexpected response, and it generated a 403 (Cloudflare actually returned a 200). Summit did the same but also made an inaccurate assumption about there being a rate limit. There is no rate limit rule that tripped in the past 24 hours. Unsure why it mentioned it.

Looking at these log entries, I think I figured out how to change the rule. Can you try it again?

Thank you so much for the help!

Yes, my IP was the one in x.154.x.x. It seems all good now! Thanks for getting it fixed so quick!

Thank you for letting me know! Based on the logs, it wasn't just you. There were about 6 others using Summit and Boost. Unsure about other phone apps. Interestingly, the Voyager app has no problem, so I didn't realize there was an issue because it's what I tested with.

Please reach out if you encounter any other problems. The firewall rules are meant to surgically stop the 290K AI scrapes that have been coming in daily. It's difficult to decide if a request is from an AI agent or a user. I expected some collateral damage. There may be more.

Scrapers make me so mad. Seriously, if you're gonna steal the contents of the fediverse, at least do it via federation so it doesn't disturb admins and users.

And much love to Sailor Jerry for captaining the ship through the storm.

Do you work in computer security ?

Ty, for this info

You're an artist ? Also a developer ?

You don't need to be an artist to post here. The topic is the feddit.online server. Anybody can participate.

No, I think anybody can post to this community. I just saw people complaining in some comments which means the majority of them likely didn't read the banner explaining the whole situation.

I am unaffiliated with the owners of feddit.online