Site icon Kitchen Soap

An Open Letter To Monitoring/Metrics/Alerting Companies

I’d like to open up a dialogue with companies who are selling X-As-A-Service products that are focused on assisting operations and development teams in tracking the health and performance of their software systems.

Note: It’s likely my suggestions below are understood and embraced by many companies already. I know a number of them who are paying attention to all areas I would want them to, and/or make sure they’re not making claims about their product that aren’t genuine. 

Anomaly detection is important. It can’t be overlooked. We as a discipline need to pay attention to it, and continually get better at it.

But for the companies who rely on your value-add selling point(s) as:

the implication is these things will somehow relieve the engineer from thinking or doing anything about those activities, so they can focus on more ‘important’ things. “Well-designed automation will keep people from having to do tedious work”, the cartoon-like salesman says.

Please stop doing this. It’s a lie in the form of marketing material and it’s a huge boondoggle that distracts us away from focusing on what we should work on, which is to augment and assist people in solving problems.

Anomaly detection in software is, and always will be, an unsolved problem. Your company will not solve it. Your software will not solve it. Our people will improvise around it and adapt their work to cope with the fact that we will not always know what and how something is wrong at the exact time we need to know.

My suggestion is to first acknowledge this (that your attempts to detect anomalies perfectly, at the right time, is not possible) when you talk to potential customers. Want my business? Say this up front, so we can then move on to talking about how your software will assist my team of expert humans who will always be smarter than your code.

In other words, your monitoring software should take the Tony Stark approach, not the WOPR/HAL9000 approach.

These are things I’d like to know about how you thought about your product:

 

Stop thinking you’re trying to solve a troubleshooting problem; you’re not.

 

The world you’re trying to sell to is in the business of dynamic fault managementThis means that quite often you can’t just take a component out of service and investigate what’s wrong with it. It means diagnosis involves testing hypotheses that could actually make things a lot worse than they already are. This means that phases of responding to issues have overlapping concerns all at the same time. Things like:

Instead of telling me about how your software will solve problems, show me you’re trying to build a product that is going to join my team as an awesome team member, because I’m going to think about using/buying your service in the same way I think about hiring.

Sincerely,

John Allspaw

 

Exit mobile version