/kbin server update - or how the server didn't blow up
Currently, on the main instance, people have created 40191 accounts (+214 marked as deleted). I don't know how many are active because I don't monitor it, but once again, I greet all of you here :) In recent days, the traffic on the website has been overwhelming. It's definitely too much for the basic docker-compose setup, primarily designed for development use. I was aware of the possible consequences of the situation happening on Reddit, but I assumed that most people would migrate to one of the Lemmy instances, which already has an established position. I hoped that a few stray enthusiasts would find their way to kbin ;)
The first step was to upscale the VPS to a higher version (66.91EUR). It quickly turned out that it wasn't enough. I had to enable CF protection just to keep the website responsive, but the response times were still very slow. At this stage, the instance was practically unusable. The next step was a full migration to a dedicated server (100EUR, the current hardware). It can be done relatively quickly, so it resulted in a 5-minute technical break. Despite the much higher parameters, it didn't get any better. It became clear that the problem didn't lie there. I'm really frustrated when it comes to server administration. That was the moment when I started looking for help. Or rather, it found me.
A couple days ago I wrote about how kbin qualified for the Fast Forward program. To be honest, I did it out of pure curiosity and completely forgot because a lot was happening during that time. During the biggest fire incident, Hannah ( @haubles ) reached out with a proposal to help. I outlined the situation (in short: the server is dying, I don't even know what I need, help! ;). She quickly connected us with Vlad ( @vvuksan ) and Renaud ( @renchap ). I was probably too tired because I don't know if the whole operation lasted 60 minutes or 6 hours, but after a series of precise questions and getting an understanding of the situation, the guys themselves adjusted the entire job. I love working with experts, and it's not often that you come across individuals so well-versed in the fediverse. Thanks to Hannah's kindness, we will be staying there a bit longer. Currently, fastly.com handles the caching layer and processes images. Hence those cool moving thumbnails ;)
Things were going well at that point. I could disable Cloudflare protection. Probably thanks to that, many of you are here today, and we got to know each other a bit better :) However, even then, when I tried to enable federation, the server would stop working.
Around the same time, Piotr ( @piotrsikora ), whom I already knew from the Polish fediverse, contacted me. He is the administrator of the Polish Mastodon instance pol.social, operates within the ftdl.pl foundation, and specializes in administering applications with a very similar tech stack. I made the decision to grant him server access. It only took him a few moments, and he came back to me with a few tips that allowed us to enable federation. In the following days, there was more of it, and we managed to reach the current level. I think it's not too bad.
Nevertheless, managing the instance has taken up about 60% or more of my time so far, which prevents me from fully focusing on current tasks. That's why I would like to collaborate with Piotr and hand over full care of the server to him. Piotr will also take care of the security side. Now I have to take this much more seriously. We still need to work out the terms of cooperation, but I want you to know the direction I intend to pursue.
We also need to migrate to a new environment because one server will sooner or later become insufficient. This time, I want to be prepared for it. This may be associated with transient issues with the website in the coming days.
The next two updates will still be about project funding (I still can't believe what happened) and moderation. The following ones will be more technical, with descriptions of changes and what contributors are doing on Codeberg. I would like to be here more often, but not as an admin, just as myself.
Thank you all for this.
P.S. In private messages, I also received numerous offers of help that I didn't even have a chance to read and respond to. You are the best!
I hoped that a few stray enthusiasts would find their way to kbin
They did. "A few" as a portion of the whole, where the whole is much larger than you ever expected it would be.
One thing I have learned in my own technical career is that you can't do everything. Even if there was enough time in a day for you to do everything, you can't know everything, at least not sufficiently enough to be most effective. You have to depend on other experts, and delegate to them what you are not qualified to do. This requires a pretty high level of trust, and makes it so that you need to develop people management skills.
Lots of technical people are short on people management skills. I don't know where you sit on that spectrum, but you may need to consider bringing on an "overseer," kind of like a project manager, to keep tabs on all of the technical resources involved - yourself included. This will help ensure that concerns are prioritized appropriately, and that communication and messaging about those priorities are consistent and clear.
I've been in that kind of position, and I take a bit of pride in my use of words. I'm happy to give any advice you like.
I haven't looked too closely at how kbin is architected yet, but would it benefit from horizontal scaling? I do full-time development of tooling to administrate very large k8s clusters for a company that you've probably interacted with today without knowing it. Not sure if k8s is the right orchestration system for you, but I'd be more than happy to provide some input on a potential migration to k8s (if kbin is a good fit there). I know there's a community on Matrix as well — I'll try to reach out there too, although it may be a bit.
If the post is anything to go by it's using the included "mostly for dev work only, mostly" docker-compose files. It would absolutely be able to be scaled out since at it's core it's just a webapp with workers. The app is already configured to use Redis for session storage so should be able to go super wide.
Only limitation is how performant you could make your postgres cluster.