Kitchen Soap – Annoying To Me.

I can’t tell you how ripped I get when people say things like this:

“cloud computing means getting rid of ops”

If by “ops” you mean “people in data centers racking servers, installing OSes, running cables, replacing broken hardware, etc.” then sure, cloud computing aims to relieve you of those burdens. If you really think ‘ops’ is just that, then you really should put down your Nick Carr book and pay attention to the real world for a change.

The reality is, if your ops team is spending a lot of time doing that, then you’re either:

Too big to use someone *else’s* cloud, because you basically have your own (Yahoo, Amazon, Google, etc.)
Stuck in 1999.

If you deal with any of these things:

handling site issues/incidents
building and maintaining tools to monitor and gather systems and application-level metrics
program abilities to adapt infrastructure to changing system or application-level conditions (usage, failure, degradation, etc.)
implements, and maintains deployment systems (code, config management, etc.)
capacity planning (no, really)

then you’re doing “ops”, by my definition. In some environments, these things are done by “developers”. But my definition says those devs are performing ops functions.

Cloud computing isn’t going to make ‘ops’ go away, it’s relieving of ops (and dev) of a bunch of pain-in-the-ass things so they can focus on the real work needed. Namely: your application.

Last I checked, clouds don’t perform the tasks listed above, because those things (done right) are application-specific. And while cloud computing enables (in an excellent way) the efficient resource allocation (or de-allocation) for an application, it doesn’t get rid of the need to do the above things.

grant •

Roland: dead wrong, and missing the point.

1. There is no such thing as fire and forget infrastructure.

2. Refining application and systems architecture through growth requires the attention of humans qualified by their experience with it and its components, and this cannot be replaced by a faceless Level III Linux Technical Support Engineer who changes every 8 hours.

If there were such fantastic, mythical gifts from the gods as fire and forget infrastructure to be had, then my phone wouldn’t be ringing every month with calls from people who have hit EC2 instance IO limits and don’t know what to do about it.

If you believe platooning systems engineers is a good idea, you have probably never talked to someone who has had a master InnoDB partition destroyed by one of them who “noticed mysql was down” and “tried to restart it for you” while an index rebuild was in progress. That’s an extreme — but very real — example. This wasn’t a tape monkey. This was a senior “consultant” on a very high profile hosting company’s engineering support staff.

And no, “admin-type” coding, isn’t going away. There will be no de facto (though plenty of du jour) “standards” in the monitoring marketplace. Do you honestly think there’s going to be an instrumentation specification implemented consistently across Java, Ruby, Python and PHP? Because that would be a requirement. Dream on — better yet, go work on making it a reality. But it’s not happening anytime soon. Provisioning? Provisioning WHAT? WHEN? HOW MUCH? And please, do not try and give us that “just in time” / “on demand” crap. Its only on time if its already online when you need it. If you’re looking at your production systems load and thinking “wow, we need more X, Y and Z let’s go deploy a bunch of additional instances!” then you’ve suffered a failure in operations already, because enough X, Y and Z would have already been there if someone qualified and possessing the requisite application-specific domain expertise were paying attention to operations. Meanwhile your cloud is being blown away by the sudden uptick, and failing in manners wholly indistinguishable from a traditional bunch of dedicated racks, as far as your management and customers are concerned.

So, OPS is going away? Sure. Sales is going away too. And HR. And Marketing. And Finance. And there are rainbows in the heavens with unicorns dancing on them. Yup.

I can see why a guy like John would get ticked off whenever some jackass cargo-cults the notion. It’s ludicrous.

13 Comments

Adam Jacob •

Woot.
Laurie Denness •

I appreciate this rant.

Of course, it depends what point you are in your business. We went through a period where we would spend almost every day installing new hardware. Luckily that doesn’t apply so much anymore.. I like my desk and datacenter’s are a horrible environment, and get to do the above ops work! 😉
Karl Katzke •

Another woot, +1, etc. for this rant.

I do ‘ops’ — which invariably ends up being a lot of ‘glue’ coding to make things work. We’re currently implementing a cloud type system in our (small) datacenter, which we’re referring to as a “self-healing” system so that we don’t have our budgets cut. 😉 It’s essentially made up of virtual containers that are geographically distributed and constantly-or-frequently backed up.

What does it mean for our users? We have 9.however-many-nines-you-want-to-put-here% uptime as long as the campus network is up, and there’s less wait time for new applications to come online because we don’t have to acquire new hardware and develop a disaster recovery plan (among other documentation) every time we address a request.
Mark Nottingham •

You go.

As someone who started life as a sysadmin, I’m always annoyed and amused by the dev-centric attitude that pervades the industry. While the worst offenders believe that they’re at the top of the food chain, the reality is that a good OPS person is worth much more than any given dev — because they understand how the real world works.

This is especially true when you’re talking about networked applications; too many developers want to live in a fantasy world where there’s one address space, binary failure modes, no latency, and concurrency is a distance (and rather new) concern. OPS folk, IME, have none of these delusions.

/rant
Jonas Galvez •

Mark, +1. These are the same people who mistake capacity planning for premature optimization.
Roland Dobbins •

All the admin-type coding and monitoring/provisioning stuff you’re talking about is gradually going to go away, as de facto and then de jure standards emerge, and are implemented in the market.

Make no mistake – ops *will* go away, in stages, over time. Ultimately, enterprises will retain software and database and *maybe* network architects, but the rest will be handled by SP owned-and-operated cloud systems, run by SP personnel.
allspaw • Author

Roland:
Nope. I have a feeling you don’t know what I mean by “ops.”

The above things I list (“ops” things) are application-specific. Monitoring, troubleshooting, capacity planning, deployment…isn’t just driven by CPU, memory, disk, and network levels measured within pre-defined windows; it’s those things within the *context* of the application. If a service provider has enough in-depth knowledge about my application to perform the above list within the right context, they’re not service providers, they’re *consultants”.

“Ops” consultants.
grant •

Roland: dead wrong, and missing the point.

1. There is no such thing as fire and forget infrastructure.

2. Refining application and systems architecture through growth requires the attention of humans qualified by their experience with it and its components, and this cannot be replaced by a faceless Level III Linux Technical Support Engineer who changes every 8 hours.

If there were such fantastic, mythical gifts from the gods as fire and forget infrastructure to be had, then my phone wouldn’t be ringing every month with calls from people who have hit EC2 instance IO limits and don’t know what to do about it.

If you believe platooning systems engineers is a good idea, you have probably never talked to someone who has had a master InnoDB partition destroyed by one of them who “noticed mysql was down” and “tried to restart it for you” while an index rebuild was in progress. That’s an extreme — but very real — example. This wasn’t a tape monkey. This was a senior “consultant” on a very high profile hosting company’s engineering support staff.

And no, “admin-type” coding, isn’t going away. There will be no de facto (though plenty of du jour) “standards” in the monitoring marketplace. Do you honestly think there’s going to be an instrumentation specification implemented consistently across Java, Ruby, Python and PHP? Because that would be a requirement. Dream on — better yet, go work on making it a reality. But it’s not happening anytime soon. Provisioning? Provisioning WHAT? WHEN? HOW MUCH? And please, do not try and give us that “just in time” / “on demand” crap. Its only on time if its already online when you need it. If you’re looking at your production systems load and thinking “wow, we need more X, Y and Z let’s go deploy a bunch of additional instances!” then you’ve suffered a failure in operations already, because enough X, Y and Z would have already been there if someone qualified and possessing the requisite application-specific domain expertise were paying attention to operations. Meanwhile your cloud is being blown away by the sudden uptick, and failing in manners wholly indistinguishable from a traditional bunch of dedicated racks, as far as your management and customers are concerned.

So, OPS is going away? Sure. Sales is going away too. And HR. And Marketing. And Finance. And there are rainbows in the heavens with unicorns dancing on them. Yup.

I can see why a guy like John would get ticked off whenever some jackass cargo-cults the notion. It’s ludicrous.
Phil Hollenback •

Chris Siebenmann recently wrote a very interesting related post about the ‘death of the sysadmin’: http://utcc.utoronto.ca/~cks/space/blog/sysadmin/SysadminDeath
Paul Guth •

I think the serious purple who talk about NoOps are referring to Ops as a separate organizational entity, not to Ops as a set of functions.
allspaw • Author

Paul: that is my understanding, as well.

The term still sucks. I don’t use ‘operations’ solely as an organizational term, I use it to indicate domain knowledge, experience, and a set of skills that are specific to a field. I suspect many people do.

A startup requiring all design to come from development wouldn’t say that they’re practicing ‘NoDesign’ or ‘NoDev’. It’s a ridiculous term.
Ryan Richards •

Awesome post. Great description of ‘Ops’. ‘Devs’ also experience something similar – if you write code as a career there will always be someone who says “can you fix my pc?”
Mark •

In the development nirvana to come, the least talented developer will be the one responsible for managing the production environment. The most talented developers will focus *rightly* on writing code. Those that are less productive and skilled will do “lesser” tasks that would have traditionally fallen to operations. The pecking order so crucial to maintaining the developer ego will thus be fulfilled anew. Or Cisco could start selling servers along with their network gear and provide a complete datacenter for a low, low price . . . oh wait, nevermind.

Comments are closed.

Annoying To Me.

Context and Operational Metrics

Slides for Velocity Talk 2009

13 Comments