Kitchen Soap – Automated Control paper by the RAD Lab folks

Wow, how did I miss this until now? In June, some smart people gathered in Barcelona for the First Workshop on Automated Control for Datacenters and Clouds (ACDC09) and jeez it looked like it was a good time, from a glance at the program.

One of the cooler papers is “Automatic exploration of datacenter performance regimes” in which the smart folks over at the RAD Lab at UCB tackle the idea of:

Gathering up real usage metrics in production
Taking that data to feed a resource allocation (“auto-scaling”) controller

The bits about coming up with an exploration policy is where the juicy stuff comes in, building in safety factors driven by external SLAs. You should read the whole thing to see how thoughtful their method was, which includes taking into account effects such as cold ramping, which you almost never see accounted for in simulated situations. Rock on, RAD Lab: this is the stuff that brings the academia smarts to the real world. Kudos.

FYI: I’m not just saying the paper is cool because they cite my book as a resource in it. 🙂

9 Comments

cb •

That does indeed look like a cool conference, and I feel your pain.,, In Barcelona, no less; Time to read some papers…
Pingback: BotchagalupeMarks for August 1st - 19:26 | IT Management and Cloud Blog
Anon •

Your first link to the paper 404’s…
allspaw • Author

Anon: thanks, fixed. 🙂
Kent •

Anyone have a link to the actual paper or is it pay to play only?

Kent
William Louth •

It would be nice if some more detail was published on the effectiveness (rather than the approach) as I have been pushing a few vendors in the grid/cloud computing space to start thinking about using real-time resource metering data used for both performance management and cost management for this auxiliary purpose. Instead of pushing tasks/jobs to nodes with current low cpu usage use the previous (aggregatd) metering profile to select a number of node/queue candidates.

William
allspaw • Author

Huh. William: so you’re suggesting that cloud *operators* use the same auto-scaling methodology, but to manage their own underlying infrastructure? As in, say, Amazon Web Services use this approach when balancing their EC2 nodes?
William Louth •

I think this [real-time metering] can be used up and down the technology stack of a cloud application/service/software. Naturally what constitutes an metered activity and resource (meter) will be dependent on the key performance (and cost) indicators. One software activity is another ones metered resource (whether it is for billing purposes or not). There are many grid products on the market now that are pushing into the cloud but that currently use very poor primitives in determining where best to send work to nodes within a cluster. Having all current system op metrics as meters (tracked to an particular software thread activity rather than process) would enable the grid to reach higher global throughput and better utilization.

I am basically saying lets meter (profile) at a contextual level in terms of the activity no matter how high or level the construct is modeled/defined as. If we can allow meters to permeate all levels we should have better (intelligent) management and performance.
Charles Sutton •

@John: Thanks for the kind words.

@Kent: The paper is freely available at http://www.cs.berkeley.edu/~bodikp/publications/ACDC09.pdf

Comments are closed.

Automated Control paper by the RAD Lab folks

Extreme Automated Infrastructure

WebOps: Good prep for becoming a new parent?

9 Comments