Reflections on the 6th Resilience Engineering Symposium

26. Jun
/
Cognitive Systems Engineering, Complex Systems, Resilience, Systems Safety, Talks
/
7 Comments

I just spent the last week in Lisbon, Portugal at the Resilience Engineering Symposium. Zoran Perkov and I were invited to speak on the topic of software operations and resilience in the financial trading and Internet services worlds, to an audience of practitioners and researchers from all around the globe, in a myriad of industries....
Continue reading...

The Infinite Hows (or, the Dangers Of The Five Whys)

14. Nov
/
Cognitive Systems Engineering, Complex Systems, Culture, Human Factors, Resilience, Systems Safety
/
21 Comments

(this is also posted on O’Reilly’s Radar blog. Much thanks to Daniel Schauenberg, Morgan Evans, and Steven Shorrock for feedback on this) Before I begin this post, let me say that this is intended to be a critique of the Five Whys method, not a criticism of the people who are in favor of using...
Continue reading...

Availability: Nuance As A Service

03. Jan
/
Complex Systems, Random, Resilience, WebOps
/
19 Comments

Something that has struck me funny recently surrounds the traditional notion of availability of web applications. With respect to its relationship to revenue, to infrastructure and application behavior, and fault protection and tolerance, I’m thinking it may be time to get a broader upgrade adjustment to the industry’s perception on the topic. These nuances in the...
Continue reading...

Fundamental: Stress-Strain Curves In Web Engineering

10. Sep
/
Cognitive Systems Engineering, Complex Systems, Human Factors, Resilience, WebOps
/
6 Comments

I make it no secret that my background is in mechanical engineering. I still miss those days of explicit and dynamic finite element analysis, when I worked for the VNTSC, working on vehicle crashworthiness studies for the NHTSA. What was there not to like? Things like cars and airbags and seatbelts and dummies and that...
Continue reading...

Resilience Engineering Part II: Lenses

(this is part 2 of a series: here is part 1) One of the challenges of building and operating complex systems is that it’s difficult to talk about one facet or component of them without bleeding the conversation into other related concerns. That’s the funky thing about complex systems and systems thinking: components come together...
Continue reading...

Each necessary, but only jointly sufficient

I thought it might be worth digging in a bit deeper on something that I mentioned in the Advanced Postmortem Fu talk I gave at last year’s Velocity conference. For complex socio-technical systems (web engineering and operations) there is a myth that deserves to be busted, and that is the assumption that for outages and...
Continue reading...

Fault Tolerance and Protection

In yet another post where I point to a paper written from the perspective of another field of engineering about a topic that I think is inherently mappable to the web engineering world, I’ll at least give a summary. 🙂 Every time someone on-call gets an alert, they should always be thinking along these lines:...
Continue reading...

Training Organizational Resilience in Escalating Situations

10. May
/
Complex Systems, Culture, Resilience, WebOps
/
8 Comments

This little ramble of thoughts are related to my talk at Velocity coming up, but I know I’ll never get to this part at the conference, so I figured I’d post about it here. Building resilience from a systems point of view means (amongst other things) understanding how your organization deals with failure and unexpected...
Continue reading...

Kitchen SoapThoughts on systems safety, software operations, and sociotechnical systems.

Kitchen Soap