Getting our applications out of the cloud provided the main celebration for our exit, but seeing the actual spend tumble is the prize. See, the only way to get pricing in the cloud down from obscene to merely offensive is through reserved instances. This is where you sign up for a year or more in ad...
Interesting how they have kept their ops team the same but now run an entire datacentre.
Overworked teams? I just can’t see how this is possible.
Not defending cloud hosting/costs etc. You generally pay more for cloud to then not have to deal with hardware maintenance, datacentre management. I didn’t see this directly in their post. Other than keeping the same size Ops team
I'm running both physical hardware and cloud stuff for different customers. The problem with maintaining physical hardware is getting a team of people with relevant skills together, not the actual work - the effort is small enough that you can't justify hiring a dedicated network guy, for example, and same applies for other specialities, so you need people capable of debugging and maintaining a wide variety of things.
Getting those always was difficult - and (partially thanks to the cloud stuff) it has become even more difficult by now.
The actual overhead - even when you're racking the stuff yourself - is minimal. "Put the server in the rack and cable it up" is not hard - my last rack was filled by a high school student in a part of an afternoon, after explaining once how to cable and label everything. I didn't need to correct anything - which is a better result than many highly paid people I've worked with...
So paying for remote hands in the DC, or - if you're big enough - just order complete racks with racked and pre-cabled servers gets rid of the "put the hardware in".
Next step is firmware patching and bootstrapping - that happens automatically via network boot. After that it's provisioning the containers/VMs to run on there - which at this stage isn't different from how you'd provision it in the cloud.
You do have some minor overhead for hardware monitoring - but you hopefully have some monitoring solution anyway, so adding hardware, and maybe have the DC guys walk past and inform you of any red LEDs isn't much of an overhead. If hardware fails you can just fail over to a different system - the cost difference to cloud is so big that just having those spare systems is worth it.
I'm not at all surprised by those numbers - about two years ago somebody was considering moving our stuff into the cloud, and asked us to do some math. We'd have ended up paying roughly our yearly hardware budget (including the hours spent on working with hardware we wouldn't have with a cloud) to host a single of one of our largest servers in the cloud - and we'd have to pay that every year again, while with our own hardware and proper maintenance planned we can let old servers we paid for years ago slowly age out naturally.
They're using a third party called deft to manage the hardware. Which is a reasonable middleground between cloud and self-operated, the more I think about it.
I haven't seen a lot of info on what the cost of that management is though but it's likely to be leagues less than AWS/GCP
It’s not just the hardware. “The cloud is expensive” is usually touted by people not understanding why managed services (like Aurora RDS and OpenSearch as suggested in the article) ‘cost more than running it themselves’ by not accounting the management costs.
A database service needs management not only in hardware (I.e. replace dead drives) but also in software (I.e. monitor cluster performance, tweak system settings to fit usage pattern, manage cluster health, etc etc). These management requires time from the ops team, often in multiple roles like SysAdmin, DBA, and Ops engineers. Fact that they claim to have moved to their own hardware without being on new talents to their ops team makes it questionable as to whether or not they actually understand the cost and If they’re overworking their existing ops team.
"An entire data center" is 8 rented racks in two enterprise data centers (4 racks in each). They're paying $60K/month for racks, cooling, and location.
That's the thing, 'cloud' is just another tool in your toolbox. It's the right tool for some workloads and the wrong one for others. The fact they've shifted the work to their own servers and kept the ops team suggests it was the wrong sort of workload to be in the cloud in the first place.
For a while there was an obsession with moving everything to the cloud, and that was always going to be an expensive mistake in a number of different ways. Hopefully, as the hype dies down more nuanced decisions will be made. There's a whole gamut of options between all in the cloud and all in the data centre, and when people jump straight from one end to the other I'm put in mind of Hamlet's quote "There are more things in heaven and earth, Horatio, / Than are dreamt of in your philosophy." Understand your workload, understand your business' future plans and their needs, and then make a plan, considering all the tools at your disposal.
I hate the obsession to move to the cloud and the obsession towards serverless or functions.
Functions are stupid and crazy for anything that is actually used often.
For small utilities, they make a ton of sense, but next time I see an app with millions of requests per day using functions, I'm going to lose my mind.
Years ago I was the senior techie in designing and implementing distributed high performance server systems and what you reminded me of just made my blood start to boil... :/
If there's anything that 3 decades in Tech have taught me is that fad-following commonly rules it, even with the supposedly logical (but not really) techies.
Cloud storage and cloud computing became a fad about a decade ago (I still remember the hype repeated by people who had never actually designed distruted systems) so there were tons of people jumping headfirst without a plan into it for the hype and the seemingly cheaper price (if you didn't think your needs and future evolution through) even though it wasn't the best choice for them.
No doubt well see the same kind of fad-following over making-sense-for-us thing with the latest hype-train: AI.
What always kept me off the "cloud" (other people's computers) is not only giving up my data but giving up control on what I spend. Corporations lure you in with flashy promises and low prices, then usually over time the service gets worse the prices go higher and higher. I'm sure the cloud hosting corporations are good at pricing their services very high but not quite high enough to make most customers cancel.
Lock-in is quite an old strategy in Tech (back in the day Microsoft's dominance was built on it) and apparently every new generation needs to learn their lesson...
Exiting cloud being useful seems to be a very narrow use case.
For one, you have to be at a large enough scale where buying and hosting your own infra is feasible and cheaper.
Second, you have to give up the ability to almost instantly scale up or provision hardware in response to traffic or other events. (which is very common at scale)
Maybe his use case happens to be that very narrow case, but this isn't something I would take as general advice.
Your last paragraph is why we've heavily used the cloud here in rural Canada for years.
Monitoring data is much easier to push into the cloud and read from there than it is hope for a reliable connection to a farm or rural plant.
Self-hosted services need to be cloud hosted for uptime and because it was getting ever harder to get a routed IPv4 address from any provider. IPv6 is nice to finally have, but Starlink is the only provider at all supporting it and it's only been a few months at that. Their prefixes change constantly too, come on guys get your shit together.
Even basic remote access systems require a VPS or VPN cloud service as you always need both ends to punch out through layers of CGNAT. Now we can finally have one end available through IPv6 but the remote user is often trying to use a IPv4 CGNAT network to connect... So you still need something in the cloud to punch holes.
Can't believe it's been over 20 years for the IPv6 rollout
This is quite intriguing. But DHH has left so many details out (at least in that post) as pointed out by @breadsmasher@lemmy.world - it makes it difficult to relate to.
On the other hand, like DHH said, one's mileage may vary: it's, in many ways, a case-by-case analysis that companies should do.
I know many businesses shrink the OPs team and hire less experienced OPs people to save $$$. But just to forward those saved $$$ to cloud providers. I can only assume DDH's team is comprised of a bunch of experienced well-payed OPs people who can pull such feats off.
Nonetheless, looking forward to, hopefully, a follow up post that lays out some more details. Pray share if you come across it 🙏
This is part of a series of posts he has done about find out his cloud bill was stupid high because they do computationally heavy software and switching over to collocation. But the whole going from 100% cloud to colo and saving that much money is not to be scoffed at.
He does say this is an outlier and others won't get as much roi as they have.
there are a number of blog posts that have different details about the how/why, etc. i just followed the links in the article to other parts of the series.
I expect that the use case is more prevalent than you think, where you are spending a decent chunk on cloud infra. I have been convinced for some time now that the costs are high compared to our on-prem. I really like the idea of a the "deft" type hardware management service, so that look after the DCs, hardware and connectivity, and we look after the software.
Hopefully, they place their servers at 2x the historical peak floodpoint. Or set up standby zones in different geographies in case there's a power or network outage.
Having your compute in "the cloud" doesn't remove the need for a good backup strategy, it just changes how it works. Yes, disaster recover for natural disasters should be easier (OHV's fire showed that this may not always be true). But, that doesn't cover cases like ransomware, insider threats, data mistakes or any other case where data is corrupted/modified by mistake. You still need a plan for these cases. And cloud based backups actually make a lot of sense.
But, just because you put your backups in the cloud, doesn't mean that your compute should be there as well. There is an advantage that your Time to Recovery is likely lower with both backups and compute in the same cloud. But, is that worth the ongoing cost of running your compute in the cloud? That needs to be considered separately. You also need to consider the cost of running on-prem versus in the cloud. If you have fairly predictable, static loads, it may be cheaper to buy and run servers yourself. For hard to predict, elastic loads, cloud may make more financial sense.
As others have said before, there was a period where companies were just going to the cloud for the sole reason that it was the popular thing to do. For some it actually made financial sense. For some, it didn't. The OP's article seems to be the latter.
The cloud isn't just for storage or compute. There are a number of managed services that let you build a full application by snapping together lego building blocks.
For example, pop together a REST API handler, an auth service, a few functions-as-a-service, a database, and a storage service. Then add a static website server. Throw a CDN in front. You got yourself a dynamic application service that can be accessed globally for a few pennies and can scale up and down without you doing anything. Add multi-zone support and auto-DNS failover and you've got a production quality scalable, resilient back-end, for both web and mobile. When it's not being used, it costs very little and when it goes big, hopefully it means you're doing well. Wrap it all in an infrastructures-as-code script and you can bring all this up in 30m.
To host all that in-house, you would have to buy a lot of equipment, stage it, manage it, add cooling, electricity, security patches, upgrades, security, etc. Now you have part of your business just doing all this instead of focusing on what you do best. I won't bother going into the tax implications of capex vs opex.
This, is what the cloud sales people call 'undifferentiated heavy lifting.' There are reasons to have on-prem hardware. For a lot of applications though, it makes more sense to let someone else take care of all that infrastructure cruft.
That was a data center, not a cloud. The sort of place they are moving to from the cloud.
With a cloud solution, you make sure to use services that are redundant. AWS and Azure build each region (geographical location) with **multiple **interconnected independent data centers (availability zones). High durability is one of the strong use cases for public clouds.
There are use cases for the cloud. I put e-mail in the cloud- ain't nobody got time to deal with providing reliable SMTP or Exchange while keeping spam out. If you have a web app that needs to scale quickly, cloud's the way. If you're a startup with limited capital and you don't want to blow it on a bunch of servers when you're not sure if you'll survive more than a year or so, cloud's the way.
But Cloud ISN'T the end-all answer for everything.
If you have a predictable workload, especially one that relies on more expensive cloud services, de-clouding can save you a bundle. Buying hardware can be cheaper than renting it, if only because (think about it) the cloud provider has to buy the same hardware and rent it to you AND make a profit. If you're going to be around a while, and you expect to use a piece of hardware for its full service life, that makes a lot of sense.
There are also many organizations that wish they has some local backups after their cloud service providers lost all their data.
Lesson to learn: Backup properly with offline storage. Tape in a safe, maybe even off-site, etc.
As long as you realize that the "cloud" is someone else's computer, it is a very viable way of hosting your service. However as your service grows all those micro services that your cloud provider charges you for will grow as well. Eventually you'll get to the point where "data transfer" costs begins to make up >50% of your total cloud spend. At that point (or ideally before) you should have a plan to stop expanding your cloud footprint, because that cost grows geometrically with the size of your cloud data and the number of cloud functions you are using on your data.
Remember Data has Weight. If you don't understand what that means, you aren't ready to make a cost comparison between cloud-hosting and data center hosting.
That's the thing, 'cloud' is just another tool in your toolbox. It's the right tool for some workloads and the wrong one for others. The fact they've shifted the work to their own servers and kept the ops team suggests it was the wrong sort of workload to be in the cloud in the first place.
For a while there was an obsession with moving everything to the cloud, and that was always going to be an expensive mistake in a number of different ways. Hopefully, as the hype dies down more nuanced decisions will be made. There's a whole gamut of options between all in the cloud and all in the data centre, and when people jump straight from one end to the other I'm put in mind of Hamlet's quote "There are more things in heaven and earth, Horatio, / Than are dreamt of in your philosophy." Understand your workload, understand your business' future plans and their needs, and then make a plan, considering all the tools at your disposal.
Relevant passage: "While there are some additional other costs associated with the extra servers, it's relative peanuts in the grand scheme (our ops team stayed the same, for example)" (emphasys mine)
Either the team had free time and weren't being used to capacity, employees aren't doing continuing education during work anymore, or they are being overworked. You don't just magically take on scope and not have labor hours shift at a bare minimum.
I refuse to take anyone at face value when they say they are using physical hardware, setting up automation, and running support on all of that and the labor hours are described as peanuts.
I also wonder what they are doing for security and data privacy.