A malfunction that shut down all of Toyota Motor's assembly plants in Japan for about a day last week occurred because some servers used to process parts orders became unavailable after maintenance procedures, the company said.
Sysadmin pro tip: Keep a 1-10GB file of random data named DELETEME on your data drives. Then if this happens you can get some quick breathing room to fix things.
Why not both? Alerting to find issues quickly, a bit of extra storage so you have more options available in case of an outage, and maybe some redundancy for good measure.
A lot of companies have minimal alerting or no alerting at all. It's kind of wild. I literally have better alerting in my home setup than many companies do lol
I imagine it's a case where if you're knowledgeable, yeah it's free. But if you have to hire people knowledgeable to implement the free solution, you still have to pay the people. And companies love to balk at that!
I think it's that and any IT employees they have would not be allowed to work on it because they would be working on other stuff because companies wouldn't prioritize that, since they don't know how important it is until it's too late.
There's cases where disk fills up quicker than one can reasonably react, even if alerts are in place. And sometimes culprit is something you can't just go and kill.
Had an issue like that a few years back. A stand alone device that was filling up quickly. The poorly designed device could only be flushed via USB sticks. I told them that they had to do it weekly. Guess what they didn't do. Looking back I should have made it alarm and flash once a week on a timer.
The real pro tip is to segregate the core system and anything on your system that eats up disk space into separate partitions, along with alerting, log rotation, etc. And also to not have a single point of failure in general. Hard to say exact what went wrong w/ Toyota but they probably could have planned better for it in a general way.
It not going to bring the service online, but it will prevent a full disk from letting you do other things. In some cases SSH won’t work with a full disk.