Alerting Companies

01. May
/
Cognitive Systems Engineering, Tools, WebOps
/
17 Comments

I’d like to open up a dialogue with companies who are selling X-As-A-Service products that are focused on assisting operations and development teams in tracking the health and performance of their software systems. Note: It’s likely my suggestions below are understood and embraced by many companies already. I know a number of them who are...
Continue reading...

Availability: Nuance As A Service

03. Jan
/
Complex Systems, Random, Resilience, WebOps
/
19 Comments

Something that has struck me funny recently surrounds the traditional notion of availability of web applications. With respect to its relationship to revenue, to infrastructure and application behavior, and fault protection and tolerance, I’m thinking it may be time to get a broader upgrade adjustment to the industry’s perception on the topic. These nuances in the...
Continue reading...

On Being A Senior Engineer

25. Oct
/
Culture, Etsy, Human Factors, Random, WebOps
/
211 Comments

UPDATE: I’ve added a short section on the topic of sponsorship. I think that there’s a lot of institutional knowledge in our field, especially about what makes for a productive engineer. But while there are a good deal of books in the management field about “expert” roles and responsibilities of non-technical individual contributors, I don’t...
Continue reading...

A Mature Role for Automation: Part I

21. Sep
/
Cognitive Systems Engineering, Complex Systems, Human Factors, Tools, WebOps
/
31 Comments

(Part 1 of 2 posts) I’ve been percolating on this post for a long time. Thanks very much to Mark Burgess for reviewing early drafts of it. One of the ideas that permeates our field of web operations is that we can’t have enough automation. You’ll see experience with “building automation” on almost every job...
Continue reading...

Fundamental: Stress-Strain Curves In Web Engineering

10. Sep
/
Cognitive Systems Engineering, Complex Systems, Human Factors, Resilience, WebOps
/
6 Comments

I make it no secret that my background is in mechanical engineering. I still miss those days of explicit and dynamic finite element analysis, when I worked for the VNTSC, working on vehicle crashworthiness studies for the NHTSA. What was there not to like? Things like cars and airbags and seatbelts and dummies and that...
Continue reading...

The Devil’s In The Details

30. Mar
/
Cognitive Systems Engineering, Complex Systems, Naturalistic Decision Making, WebOps
/
9 Comments

I’m a firm believer that context is everything, and that it’s needed in every constructive conversation we want to have as engineers. As a nascent (but adorable) engineering field, we discuss (in blogs, books, meetups, conferences, etc.) success and failure in a number of areas, including the ways in which we work. We don’t just...
Continue reading...

Fault Tolerance and Protection

In yet another post where I point to a paper written from the perspective of another field of engineering about a topic that I think is inherently mappable to the web engineering world, I’ll at least give a summary. 🙂 Every time someone on-call gets an alert, they should always be thinking along these lines:...
Continue reading...

Systems Engineering: A great definition.

18. Jul
/
Complex Systems, Culture, Random, WebOps
/
11 Comments

Ben Rockwood said something last December about the re-emergence of the Systems Engineer and I agree with him, 100%. To add to that, I’d like to quote the excellent NASA Systems Engineering handbook’s introduction. The emphasis is mine: Systems engineering is a methodical, disciplined approach for the design, realization, technical management, operations, and retirement of...
Continue reading...

Training Organizational Resilience in Escalating Situations

10. May
/
Complex Systems, Culture, Resilience, WebOps
/
8 Comments

This little ramble of thoughts are related to my talk at Velocity coming up, but I know I’ll never get to this part at the conference, so I figured I’d post about it here. Building resilience from a systems point of view means (amongst other things) understanding how your organization deals with failure and unexpected...
Continue reading...

Resilience Engineering: Part I

I’ve been drafting this post for a really long time. Like most posts, it’s largely for me to get some thoughts down. It’s also very related to the topic I’ll be talking about at Velocity later this year. When I gave a keynote talk at the Surge Conference last year, I talked about how our...
Continue reading...

12 3 4 Next Page

Kitchen SoapThoughts on systems safety, software operations, and sociotechnical systems.

Kitchen Soap