Blahaj Lemmy Meta @lemmy.blahaj.zone Ada @lemmy.blahaj.zone 3 mo. ago

Recent Performance Issues and an interesting graph

In the last 24 hours you've likely noticed that we've had some performance issues on Blåhaj Lemmy.

The initial issue occurred as a result of our hosting provider having technical problems. We use Hetzner, who provides hosting for approximately a third of the fediverse, so there was wide spread chaos above and beyond us.

As of lemmy 19.x, messages queue rather than getting silently dropped when an instance is down, so once Hetzner resolved their issues, we had a large backlog of jobs to process. Whilst we were working through the queues, we were operational, but laggy, and our messages were an hour or more behind. These queues aren't just posts and replies, but also include votes, so there can be a large volume of them, each one of which needs to be remotely verified with the sending instance as we process it, so geographical latency also plays a part.

As you can see from the graph, we are finally through the majority of the queues.

The exception is lemmy.world. Unfortunately, the lemmy platform processes incoming messages on a sequential basis (think of it as a sequential queue for each remote instance), which means Blahaj Lemmy can't process a second lemmy.world message until we've finished processing the first message.

Due to the size of Lemmy.world they are sending us new queue items almost as fast as our instance can process them, so the queue is coming down, but slowly! In practical terms, this means that lemmy.world communities are going to be several hours behind for the next few days.

For those that are interested, there is a detailed technical breakdown of a similar problem currently being experienced by reddthat, that explores the impact of sequential processing and geographical latency.

10 comments

Yowza! If lemmy.world gets any bigger will others be able to keep up?
- As long as we get the ability to process incoming queues in parallel, rather than in sequence, then it will be fine. But until that happens, yeah, this will become more of a problem.
- I was worried about this too! Lemmy isn't that large for a social media site - does this mean Lemmy will scale poorly if it gets larger? Is it disingenuous to pretend self-hosting is a good idea as a the fediverse grows? Doesn't seem feasible unless you've got a CS degree.
  
  Hope the developers have some tricks up their sleeves. That reddthat post seems... troubling?
  
  I'm out of my depth though, I'm not a programmer (unless you count some ugly scripting in R).
  
  All it means is that lemmy needs to implement the ability to process queues in parallel instead of serial, so that one message doesn't hold up the next
How does mastodon sync after downtime? Thanks for sharing, ada!
- I've never had much to do with Mastodon, so I'm not sure if it processes inbound queue items sequentially or not, but the process is otherwise very similar.
I'm curious on if it's possible to do multiple queues, and if not, to rate limit lemmy.world.
- There are multiple queues, but each inbound instance queue is serial. At the moment it's not possible to split the incoming jobs from one instance across several concurrent queues, but one hopes it will be in the future. That would bypass the need for rate limiting.
  
  Interesting, gotcha. I'm not a backend dev so I assume you have more info than I do on all this, just spitballing.
Seems like the two devs will have to take a look at parallel processing. Serial is OK whilst the threadiverse is small but whilst it's not huge (like twitter/reddit/etc) it's certainly big enough now that serial processing is just not up to the job.

You've viewed 10 comments.