194
80

Server side bug with lemmy.world and intermitent authentication.

2y 11mon ago by lemmy.ml/u/idunnololz_test in support

I am currently getting signed out every minute from lemmy.world. This is not a client side cache issue. I tested making API calls from the command line (with curl) with no cache and the issue still occurs. One call I get the correct response, the next I get a 400 telling me im not signed in.

I'm primarily testing with the https://lemmy.world/api/v3/user/unread_countapi endpoint. I'm not sure if this issue occurs with all endpoints.

Reproduction steps:

  1. Get a lemmy.world JWT token for your account using your desired method (eg. postman).
  2. curl https://lemmy.world/api/v3/user/unread_count?auth={JWT_TOKEN_HERE}
  3. Note the 400 error. If you do not get an error repeat step 2.

Edit

This issue only seems to affect lemmy.world so a temporary workaround is to use a different instance for the time being.

Just a quick statement from the admins team to say that we are aware of the issue and yes we are looking into this.

Thank you @idunnololz@lemmy.world for the elaborate report and everyone else for their patience while we try to sort this one out!

Edit: Lemmy was upgraded to 0.18.2

Thank you for all that you do for this place. I am consistently amazed at how quickly y'all are able to resolve issues.

o7

Thank you for making a statement about it!

Sounds like lemmy.world runs on 2 instances and the requests are being loadbalanced between those two. That and that the jwt secret is different between those two instances causing one to accept and the other to reject

This is also my theory. I think you’re right on the money here. They probably rotated secrets from yesterday’s hack and forgot to restart both servers.

Does anyone know who can contact the server admins?

Yell real loud in all caps

REAL LOUD

Same problem for me it seems, dunno if I'll even be able to comment. Refuses to stay logged in.

From my tests, it's almost perfectly a 50/50 whether any API requests you make will yield a 200 (success) or a 400 (not signed in). If you perform an action that takes 3 API requests, your chances of succeeding is (1/2)^3 or 1/8 because only 1 request needs to fail in the chain for the entire action to fail. So, as long as you make single API actions you can maximize your success rate :D

Smells like two instances behind the load balancer, one is fine with the JWT, one is not.

What's an example of something that would take more than one API request?

Signing in. Most websites/apps will probably also grab your unread count, and maybe even your subscription feeds.

Another example is checking your inbox. Lemmy actually has 3 inboxes: mentions, replies and PMs. A lot of websites/apps bundle these three so they will need to check all 3 inboxes via 3 API calls.

Seems like spamming actions also gets it to work eventually. It's a pain in the arse though lol. I made some alt accounts on other instances, but I'm lazy and don't wanna rebuild my subscription feed if I don't have to, so hopefully it gets fixed at some point.

Same issue here, I'm being automatically logged out of my lemmy.world account in Firefox. If I refresh the page even immediately after logging in, I'm automatically logged out.

Yeah. Lemmy.world is currently unusable on the desktop. I don't have that problem in Memmy. Growing pains but I hope the problem will be fixed soon. Do anyone know if one of the mods in North America are aware of the problem?

I was having trouble in liftoff and the browser. Cleared data and cache from liftoff thinking maybe something got messed up there and now I can't even log back into my .world account 🤷‍♂️ I'll hang here for a bit I guess.

Same here, can't log in again via Liftoff.

I'm choking in desktop browser and in liftoff. Jerboa seems ok. It's weird to me how different clients react differently, I'm not sure how they interact differently.

I’m having to reauthenticate in safari and wefwef every time I load a new page. Furthermore, the login is frequently failing.

Login in likely always succeeding. The issue is that whatever app/website you use will make additional API calls afterwards (eg. fetch posts or fetch unread count). Each of those calls have a 1-in-2 chance to succeed and if any of them fail, they all fail and you will be booted out.

Lemmy is now an RNG game. We must prayge to rngesus before making any actions.

Schroedinger’s API call.

Okay, so how do we get this fixed? Any way to get admin attention? I think Spaltovic@lemmy.world is probably correct about the cause.

FWIW, I can confirm I'm having this issue as well. The load balancing hypothesis seems sound given the behavior I'm seeing. Definitely making lemmy.world pretty much unusable at this point.

I've been experiencing something similar/related. If I am logged in and open something in a new browser window, it frequently (starting today) shows me as not logged in. If I refresh the page, I'm suddenly logged in. This doesn't feel like a authentication problem as much as a timing issue while loading the page. Or maybe what I'm seeing is an entirely different issue.

Same here, it’s driving me mad. Also did the above and glad to see I’m not alone!

The good news is it only appears to affect lemmy.world. If you have an account on another instance, you should switch to that account for now.

I might do that at this point, with the attention world is getting, it might be smart to have a backup.

At least when you can't log in on one instance you can just login on another. Downtime doesn't mean you have to go do something else anymore!

I'm seeing the same issues on my app, calling login, then immediately using that jwt to fetch the site details and it doesn't give my_user half of the time, and if my app loads far enough to check the unread count I get not_logged_in

Same here, I thought I was going crazy

I can't seem to comment on a couple specific posts on the instance. But as you can see, it works on this one. I am wondering if that's related? I'm not even on my Lemmy.World account and get an unable to post error as soon as I hit the button, like it's not even trying to do anything.

From my experience it's entirely random. You can make 5 actions and all 5 will work. Then have a string of 5 actions where none would work.

Yes I have the issue on Liftoff - can't even log in again. Alt time for now!

Just add me to the list. Jerboa seems to stay logged in for about 75% of interactions.

I'm getting this too, even after clearing cookies and logging in again. I've seen it on multiple devices (Android phone, Linux desktop with Chrome).

Making a new post is a nightmare. I wish the submit button would time out in these instances so you can try again. Right now I'm having to copy and paste into a new tab and hope for the best (but fail, 5 times and counting).

This is happening in the connect app and I thought I was going crazy.

I'm having the same issue... might switch to an alt or something for now.

Lucky I have a .ml account

is there a way to see lemmy instances sorted by subscriber amount ? I might wanna join the smaller ones lol

Lemmy.Fmhy.ml is a sweet spot. It's in the top 10, but further down the list. Active and ran by cool admins. No issues from massive user counts (so far). The community is pretty strong and active so i imagine it's got some staying power.

https://lemmyverse.net

It let‘s you sort for all kind of things in regards of instances and communities.

If you go to small your instance also might die out on you. I think vlemmy just died.

I noticed it a little yesterday, but strangely it seems worse today. I'd cleared cache & cookies after the compromise issue, so I was pretty sure it wasn't to do with not doing that, however I think OP's on the right track with their assessment.

Not really sure why it would suddenly seem worse though, other than maybe something to do with server traffic/activity.

Might be worth noting that the Google password manager doesn't recognize to auto fill

Same here. All Edge, Firefox and Safari keep logging me off. Same with Memmy iOS app.

The issue is server sided so it will not matter what you use unfortunately. Technically a temp but terrible fix is to keep retrying on 400s (not signed in) until a 200 (success) is returned. This is terrible because you pretty much never want to retry for 400 errors because 400 errors are client side errors (except in this case).

Thx for elaborating. Not the answer one wants to hear, but at least I won't continue foolishly to restart browser and clear cache / cookies.

It's the same with the connect app

I was running into this across both my accounts on lemmy.world. Changing my password seems to have resolved it both on the web and in Mlem.

Lovely lmao

I have the same issue, not sure what's the root of the problem

Ah damn. Was wondering what was happening. My lemmy.world account is unusable atm due to the bug. I'm gonna have to figure out what all of my subs were.

There are tools that can help! https://lemmy.ml/post/1875767

I made LASIM - it's takes 2 API calls to fetch your subscriptions (1 login, 1 profile), so with lemmy.world being 50/50 on those calls, you might have to try a few time, but once you have em, it will be easy to push them to a new instance.

What happens when you call POST /api/v3/usre/login after the 400? Do you get another JWT token or the same one?

Different token each time

Does resetting your password stop it?

No. This issue goes a lot deeper than your login information.

You have actually tried though? Sounds like being kicked off because of a login from another device.

It's based on my understanding of how servers work and my tests. There is obviously always room for error, but I'm like 99% confident I'm right.

Also AFAIK lemmy doesn't kick you out because you signed in elsewhere.

I'm just troubleshooting by eliminating a massive category of possible causes.

I'm asking you to physically try, not your opinion. Please, we have all been there "I don't need to reset my router, I know what I'm doing"

Ok mate, you don't need to be abusive about this, i was acting in good faith to try and help. I just logged into my lemmy.world account and see what you mean, instantly not logged in.

Fwiw, the fix literally yesterday was to change your password in certain app/mobile configurations to force invalidate your old token that was signed by the old key.

At the time i was commenting, I didn't see much information about what was going on, so i just went for the default of asking questions and ruling things out. Didn't realise it was a significant % of people and that they didn't even get a chance to stay logged in for a second. When i talked with them they made zero attempts to even honor a word i said when they could have just corrected and filled me in about my clear lack of info. This is why people dont help others any more. I've learned my lesson. I'm done.