Resilience

Each necessary, but only jointly sufficient

February 10, 2012

I thought it might be worth digging in a bit deeper on something that I mentioned in the Advanced Postmortem Fu talk I gave at last year’s Velocity conference. For complex socio-technical systems (web engineering and operations) there is a myth that deserves to be busted, and that is the assumption that for outages and [...]

Read the full article →

Fault Tolerance and Protection

September 8, 2011

In yet another post where I point to a paper written from the perspective of another field of engineering about a topic that I think is inherently mappable to the web engineering world, I’ll at least give a summary. Every time someone on-call gets an alert, they should always be thinking along these lines: Does [...]

Read the full article →

Training Organizational Resilience in Escalating Situations

May 10, 2011

This little ramble of thoughts are related to my talk at Velocity coming up, but I know I’ll never get to this part at the conference, so I figured I’d post about it here. Building resilience from a systems point of view means (amongst other things) understanding how your organization deals with failure and unexpected [...]

Read the full article →