Multiple Perspectives On Technical Problems and Solutions

12. Aug
/
Architecture, Complex Systems, Culture, Etsy
/
19 Comments

Over the years, a number of people have asked about the details surrounding Etsy’s architecture review process. In this post, I’d like to focus on the architecture review working group’s role in facilitating dialogue about technology decision-making. Part of this is really just about working groups in general (pros, cons, formats, etc.) and another part...
Continue reading...

Stress, Strain, and Reminders

This is a photo of the backside of the T-shirt for the operations engineering team at Etsy: This diagram might not come as a surprise to those who know that I come from a mechanical engineering background. But I also wanted to have this on the T-shirt as a reminder (maybe just to myself, but...
Continue reading...

Learning from Failure at Etsy

30. Sep
/
Cognitive Systems Engineering, Complex Systems, Culture, Etsy, Human Factors, Systems Safety
/
20 Comments

(This was originally posted on Code As Craft, Etsy’s engineering blog. I’m re-posting it here because it still resonates strongly as I prepare to teach a ‘postmortem facilitator’s course internally at Etsy.) Last week, Owen Thomas wrote a flattering article over at Business Insider on how we handle errors and mistakes at Etsy. I thought...
Continue reading...

On Being A Senior Engineer

25. Oct
/
Culture, Etsy, Human Factors, Random, WebOps
/
211 Comments

UPDATE: I’ve added a short section on the topic of sponsorship. I think that there’s a lot of institutional knowledge in our field, especially about what makes for a productive engineer. But while there are a good deal of books in the management field about “expert” roles and responsibilities of non-technical individual contributors, I don’t...
Continue reading...

Etsy’s Chef Repo, 2010

31. Dec
/
Etsy, WebOps
/
1 Comment

Etsy’s Chef Repo, 2010 from jspaw on Vimeo. Delicious InfoViz courtesy of Gource....
Continue reading...

MTTR is more important than MTBF (for most types of F)

07. Nov
/
Culture, Etsy, Flickr, Slides, Talks, WebOps
/
26 Comments

UPDATE, 10/17/2017: This post hasn’t aged well, and needs some patching. The title should be “TTR is more important than TBF (for most types of F)” Why? Because taking the statistical mean of TTR or TBF makes absolutely no sense, whatsoever. Incidents and events simply are not comparable in that way, and even if they were, the time...
Continue reading...

Go or No-Go: Operability and Contingency Planning (Surge)

03. Nov
/
Etsy, Slides, Talks, WebOps
/
2 Comments

Last month I had the honor of speaking at the Surge Conference in Baltimore, put together by OmniTI. It was a most excellent conference, and the expertise levels were ridiculously high. I count myself lucky to be considered the same league as the rest of the presenters. I did give a Keynote talk, and I...
Continue reading...

Kitchen SoapThoughts on systems safety, software operations, and sociotechnical systems.

Kitchen Soap