editione1.0.1
Updated August 7, 2023The above examples are just a few ways in which your code can break—the list keeps going. Murphy’s Law states that “anything that can go wrong, will go wrong.” As a programmer, it’s your job to identify all the scenarios in which your program can fail, and then to take steps to reduce the likelihood of those scenarios happening. In some cases though, your program will fail in unexpected ways that you never could have imagined, which makes it hard to plan for.
Here are more examples of ways that your system can fail. Remember, sometimes it’s not just the code itself but other pieces of the system that can fail too.
Bad logic
Unanticipated inputs
Bad configurations
CPU maxed out
Memory leaks
No remaining disk space
Hardware failures
Network failures
Database corruption
User error
DDOS attacks
XSS attacks
SQL injection
Social engineering attacks
Natural disasters
Any combination of these failures can occur at any given time, and you or your team might be responsible for getting the system back online when it does. The risk of some of these disasters can be managed and mitigated ahead of time, which we’ll learn about in How to Manage Risk, but others are much harder to predict or prevent. The best thing you can do is be prepared for anything to happen at any time.
Now, let’s shift our focus to things you should do during an incident that will help you triage, identify, and fix issues as they occur.
So, your code was deployed to production, and now things are getting thrown left and right. What do you do? It can be stressful, especially for someone that doesn’t have a lot of experience dealing with production outages. The errors keep coming, and you haven’t been able to identify the root cause yet. You don’t even know where to begin looking.
First off, take a breath. Panicking won’t do you any good here and will probably make the situation worse. So, the first thing you should focus on is staying as calm and collected as you can. As you get more experience working during production incidents, this gets more natural, but it’s easier said than done the first few times.
Next, try to be methodical with your approach to identifying the root cause of the issue. There might be a million thoughts racing through your head about what it could be, but if you don’t slow down, you won’t be able to think clearly. Slow is smooth, smooth is fast. It may sound counterintuitive, but by focusing your attention on one thing at a time, you can often move quicker than trying to do too many things at once.