[Done - for now...] Expect some brief restarts today (Jun 12 CET)
3y 21d ago by lemmy.world/u/ruud in lemmyworldI'm trying to fix this annoying slowness when posting to larger communities. (Just try replying here...) I'll be doing some restarts of the docker stack and nginx.
Sorry for the inconvenience.
Edit: Well I've changed the nginx from running in a docker container to running on the host, but that hasn't solved the posting slowness..
Thank you Ruud for hosting! Your work is much appreciated.
2 restarts done already :-)
Hmm. I guess the delay in posting is not related to nginx. I now have the same conf as a server that doesn't have this issue.
I'm only familiar with the high-level Lemmy architecture, but could it be related to database indices being rebuilt?
Good luck today lol
Godspeed to you over the coming days man. Really appreciate you putting this together and the extra work it takes when tackling something like this (both being new to the platform and the tech still being in relative infancy) - not to mention the crazy scaling happening. I will definitely be pitching in to help make sure the server stays up!!
Thanks for putting in the time to make this run smoothly
Keep up the good work!
I joined this instance and also mastodon.social, first time using the Fediverse and as excited as confused lol
Thanks for your work on this server!
Hehe, the joys of troubleshooting and profiling. Isn't it fun?
Hmm if it takes too long the fun disappears... ;-)
You got this. <3
I don't have experience scaling Lemmy, but I do have experience scaling stuff in general. I'm sure you've got a few people here who'd be willing to talk things through with you if you get too frustrated.
And don't forget to breathe and step back if you have to. Your well being is more important.
Thank you so much for this amazing instance!
You're welcome!
ruud@lemmy.world have you seen https://lemmy.world/post/72136?
Thanks! Hadn't seen, but changed that (also to 512!) yesterday.
That is definitely a good lead
Something is weird.
I opened this post from main page "subscribed listing", but the title showed "I can't find any cannabis cultivation community", but the comments were same. I initially thought I have opened a wrong post, but the comments were mentioning "Good work Ruud", so I refreshed and it fixed post's title.
Have you noticed the issue?
I've noticed a couple oddities as well.
- I refresh a page and a completely different page loads instead
- An autorefresh hits the community tab, but it loads up 10 posts from a single community I'm sure it'll get sorted out eventually lol
It's happened to me a few times as well (not just on this instance, think it's a bug in Lemmy itself). So far I'e not found a reproducible pattern though so it's a tricky one to bug report effectively.
Yeah, I tried opening it again a few times, no luck yet. Will see if I can figure out any pattern.
I had something similar happen yesterday.
I opened a thread about pokemon, browsed it for a bit, did some stuff in other tabs, and clicked back to the pokemon tab maybe an hour later to browse some more.
The post had changed to one where a user was asking for relaxing game recommendations and it was loading in new comments that seemed to be from that post, but I could still see the comments that had already loaded from the pokemon post when I scrolled down.
When I refreshed it changed back to the pokemon post and only showed comments from that.
Thanks a bunch! I'll be donating for what it's worth. I really like it here.
Since I have you here, if I start my own instance do I absolutely have to use docker? I've never had good experiences with it and would rather just install programs the old-fashioned way
Well if they can create a docker image out of it, you should be able to install it on a VM.. but I run it in Docker because it makes everything so easy manageable...
Docker is not necessary, lemm.ee for example is running without docker!
Here is documentation for setting it up: https://join-lemmy.org/docs/en/administration/from_scratch.html
Of course you can fully adapt it to your own use case. The Lemmy backend is a single binary, you don't even need to build it on the same machine which will run it. There's no hard requirement to use nginx or anything like that either - if you understand what this guide is doing, you can replace all the unimportant parts as needed.
Awesome!!! Gonna work on it this weekend. Thank you!
Interesting, thanks for posting
It is possible to do it without docker... but nobody recommends it :)
There is a how-to on how to set up your own instance without docker using ansible: https://join-lemmy.org/docs/en/administration/install_ansible.html
Note that this is just basically a script to deploy lemmy on a remote server. And it uses docker. It just does it for you. (Mostly)
Oh, oof. Didn't look into it much further as the docker solution would have suited me best also. Thanks for the heads-up
This is also only for Debian AFAIK
Technically no, but they put all their update info and support for docker.
Any progress on this. I've been thinking about it too. Couple of ideas:
Too many indexes needing to update when an insert occurs?
Are there any triggers running upon insert?
Unlikely but there isn't a disk write bottleneck? Might be worth running some benchmarks from the VM shell.
I was thinking that as well, it’s like the post gets “checked” or something like that and that gives a timeout of 20secs. It could be an api or database but somehow my spidey sense says this could well be in code. Some extra calls to filter things maybe? Using an external server? Or even the propagation to the others? (Idk how this federation thing connects to the others, could be just that; maybe another server that is the bottleneck) I just found the 20 seconds suspicious given that is the default timeout
Didn't know about the timeout but that makes sense. Would be easy to test by changing the nginx timeout.
Another thought: how many db connections do you have? Could it be starved because there are so many selects happening and it needs to wait for them to finish first?
pg_locks shows alarming periods when lots of locking is holding up activity. Inserts take pretty long time on like tables for comments and postings.
Hey, I just want to echo what everyone else is saying - thanks much for hosting + all the efforts to keep things working well. It's appreciated 👍
Well worth any inconvenience, thank you so much for hosting!
Thank you, Ruud!
Hey. From my own experience - Nginx is awesome and fast when it is working, but the more you want from it, the more difficult it becomes.
Give Caddy a try. This reverse proxy has always been excellent for me. It has HTTP3 (QUIC) support, automatic ACME and overall excellent configuration in terms of simplicity and user friendliness.
Caddy is not a good choice if you need TCP/UDP proxy. It's only HTTP/HTTPS proxy.
Someone said this about Caddy "it injects advertising headers into your responses". Is this true? I don't know anything about caddy but that doesn't sound too good lo (to be fair it could be misinformation).
Never heard about it. This is open source project, free to use.
In case you want to understand why it's good, check out Caddyfile example. Just specify something like this:
example.com {
reverse_proxy backend:1234
}
And that's it! It automatically binds on 0.0.0.0:80 only for redirects to 0.0.0.0:443 + using ACME adds TLS, all behinds the scenes.
Add 1 more line to my given example and it adds compreasion.
I've been using it for my self-hosted stuff for prob 1-2 years and it kept working flawlessly all the time. Very satisfied.
Sounds very cool. Does running with that file also handle the SSL certificate and validation automatically? Or are there extra steps?
Everything is automated. As long as you know how ACME is working (port 80, accessible from the internet), everything is done in the background, including TLS (SSL) certificate maintenance.
A minimal config like that will default to provisioning (and periodically renewing) an SSL certificate from Let's Encrypt automatically, and if there are any issues doing so it will try another free CA.
This requires port 80 and/or 443 to be reachable from the general Internet of course, as that's where those CAs are.
There's an optional extra step of putting
{
email admin@emailprovider.com
}
(with your actual e-mail address substituted) at the top of the config file, so that the Let's Encrypt knows who you are and can notify you if there are any problems with your certificates. For example, if any of your certificates are about to expire without being renewed1, or if they have to revoke certificates due to a bug on their side2 .
As long as you don't need wildcard certificates3, it's really that easy.
1: I've only had this happen twice: once when I had removed a subdomain from the config (so Caddy did not need to renew), and once when Caddy had "renewed" using the other CA due to network issues while contacting Let's Encrypt.
2: Caddy has code to automatically detect revoked certificates and renew or replace them before it becomes an issue, so you can likely ignore this kind of e-mail.
3: Wildcard certificates are supported, but require an extra line of configuration and adding in a module to support your DNS provider.
Good luck with it.
Thank you! Sounds like a lot of work.
Thanks for hosting.
Lots of slowness still.
username
s out
Lots of instances are getting hugged to death. lemmy.ml and beehaw.org aren't even loading for me anymore.
Great work Ruud!
Somehow I don’t think the slowness when posting or saving is due to the nginx server / reverse proxy running inside the Lemmy container.
I would think it’s related to inserts and updates in the DB, but I haven’t had time to look into it on my instance, sorry!
Edit wait! Posting and saving is fast now! What did you change? Nicely done! 👍
No it's not! Not for me anyway. Yes I'll be looking into that, but first migrate the server!
Oh wait, that’s because I’m posting to Lemmy.world from my instance. It’s only slow when posting to Lemmy.world from a Lemmy.world user.
With that in mind, it makes me think it has something to do with some insert or update that happens. My local DB is not under load, so my save is fast. Lemmy.world’s DB is under load so the save is slow.
It might not even be the insert/update that is slow. Could be some other insert into another table that gets triggered on save that is the culprit.
Posting seems faster when posting to a non-local federated community. Maybe that's what you experienced?
You are right. Good catch.
As others have said, thanks for hosting. I also have an account at mastodon.world so I had to join here. ;)
thanks for everything!
Just joined but nice to see work happening to keep it working :)
Macapps and formula1. Macapps so far has only commuted to the 48 hours, although I’d like to see them stay dark longer. Unsure off the top of my head about formula1, although they do have a discord channel that I’ve jumped on, but won’t be the same. Like jumping in the F1 subreddit around race weekends.
There's a formula 1 lemmy. I found it through the search on Jerboa (android app).
Testing reply speed....
Test
Thanks for your work! Really appreciated
Well I’ve changed the nginx from running in a docker container to running on the host, but that hasn’t solved the posting slowness…
I encourage you to install PostgreSQL extension pg_stat_statements to get a better idea of just how overloaded the database can be.
I am finding SQL INSERTS are taking a long time on likes - and you have way more data than I do on your server.
https://github.com/LemmyNet/lemmy/issues/2877#issuecomment-1597654037
Thank you Ruud!
some of the local event subs were pretty good, even if they were kind of spotty. i like experimental electronic music and noise music, and those events aren't usually advertised, so if you want to find something that you don't already know, it's an invaluable resource. those folks are pretty savvy though, so I can see them moving elsewhere.
Good luck :) and thanks
The guy is already providing his instance on his own cash/time, let him sleep at night. It's Lemmy, not a life-needed tool
Lol. Is there a "choosing beggars" community here yet?
Definitely. The above comment seems to lack a bit of empathy towards the host.
Not possible when you're actively trying to troubleshoot. This instance is running on one server.
Indeed, but I'm sure there's more than a simple restart. Moreover, cron won't rollback on its own.
You do realize that the night doesn't fall in the entire world at the same time?
I was going to say this. Realistically they probably need multiple servers to allow failing over and rolling updates if we want 24/7 uptime, and that seems well beyond the scope of this right now.
8 downvotes. Nukes entire account.
Maybe it was u/spez checking out the competition?
It's the only way to be sure