Kitchen Soap

Anyone who has known me well knows that I’m generally not satisfied with skimming the surface of a topic that I feel excited about. So to them, it wouldn’t be a surprise that I’m now working on (yes, while I’m still at Etsy!) a master’s degree.

Since January, I’ve been working with an incredible group as part of the master’s degree program in Human Factors and Systems Safety at Lund University.

lund_logo This program was initially started by Sidney Dekker, and now is directed by the wicked smart Johan BergstrÃ¶m, whose works I’ve tweeted about before. As a matter of fact, I was able to convince JB to keynote this year’s Velocity Conference in Santa Clara next month on the topic of risk, and I can’t be more excited for it.

So why am I all gaga about this program?

To begin with, I’ve been a huge proponent of learning from other fields of engineering. In particular, how other domains perceive failures; with foresight, in hindsight, how they aim to prevent them, detect them, recover from them, and learn from them.

The Velocity Conference (and Surge, for that matter) are always filled with narratives of success and failure, and for that I’m grateful.

But I think for me it goes deeper than that.

We’re now in a world where the US State Department calls Twitter to request that their database maintenance be postponed because of political events in the Middle East that could benefit from it being available. It’s also all but given at this point that Facebook has had an enormous effect on global discourse on wide-ranging topics, many people pointing to its effects on the Arab Spring.

As we speak, REG-SCI is open for public comment from the SEC. Inside that piece of work is an attempt to shore up safeguards and preventative measures that exchanges may have to employ to make themselves less vulnerable to perturbations and disturbances that can result in the next large-scale trading surprises that came with the Flash Crash, the BATS IPO event, and the Knight Capital incident.

And yes, here at Etsy we have been re-imagining how commerce is being done on a global scale as well. 🙂

How do we design our systems to be resilient? Are the traditional approaches still working? How will we know when they stop working?
How can we view the “systems” in that sentence to include the socio-technical relationship that organizations have to their service? Their employees? Their investors? The public?
How does the political, regulatory, or commercial environment that our services expect to live in affect their operation? Do they influence the ‘safety’ of those systems?
How do we manage the inevitable trade-offs that we experience when we move from a startup with a “Minimum Viable Product” to a globally-relied-upon service that is expected to always be on?
What are the various ways we can anticipate, monitor, respond to, and learn from our failures and our successes?

All of these questions could be seen as technical in nature, but I’d argue that’s too simplistic. I’m interested in that beautiful and insane boundary between humans and machines, and how that relationship is intertwined in the increasingly complex systems we build over and over again.

My classmates in the program are from the US Forestry Service, air traffic control training facilities and towers, Australian mining safety, maritime accident investigation firms, healthcare and some airline pilots as well. They all have worked in high-tempo, high-consequence environments, and I’m finding even more overlap in thinking with them than I ever thought I would.

The notion that the web and internet infrastructures of tomorrow are heavily influenced by the failures of yesterday riddle me with concern and curiosity. Given that I’m now addicted to the critical thinking that many smart folks have been giving the topic for a couple of decades now, I figured that I’m not going to half-ass it, and lean into it as hard as I can.

So expect more writing on the topics of complex systems, human factors, systems safety, Just Culture, and related meanderings, because next year I’ve got a thesis to write. 🙂

Kitchen SoapThoughts on systems safety, software operations, and sociotechnical systems.

Always a Student: Operations and Systems Safety

5 Comments

Always a Student: Operations and Systems Safety

Availability: Nuance As A Service

Prevention versus Governance versus Adaptive Capacities

5 Comments