Quantcast

From the category archives:

Flickr

That was a pretty good time. Saw lots of good and wicked smaht people, and I got a lot of great questions after my talk. The slides are up on slideshare, and here are the PDF slides.

UPDATE: Gil Raphaelli has posted his python bindings he wrote for our libyahoo2 use in our Ops IM Bot.

There was something that I left out of my slides, mostly because I didn’t want to distract from the main topic, which was optimization and efficiencies.

While I used our image processing capacity at Flickr as an example of how compilers and hardware can have some significant influence on how fast or efficient you can run, I had wondered what the Magical Cloud™ would do with these differences.

So I took the tests I ran on our own machines and ran them on Small, Medium, Large, Extra Large, and Extra Large(High) instances of EC2, to see. The results were a bit surprising to me, but I’m sure not surprising to anyone who uses EC2 with any significant amount of CPU demand.

For the testing, I have a script that does some super simple image resizing with GraphicsMagick. It splits a DSLR photo into 6 different sizes, much in the same way that we do at Flickr for the real world. It does that resizing on about 7 different files, and I timed them all. This is with the most recent version of GraphicsMagick, 1.3.5, with the awesome OpenMP bits in it.

Here is the slide of the tests run on different (increasingly faster) dedicated machines:

Faster Image Processing Hardware

and here is the slide that I didn’t include, of the EC2 timings of the same test:

Image Processing on EC2

Now I’m not suggesting that the two graphs should look similar, or that EC2 should be faster. I’m well aware of the shift in perspective when deploying capacity within the cloud versus within your own data center. So I’m not surprised that the fastest test results are on the order of 2x slower on EC2. Application logic, feature designs (synchronous versus asynchronous image processing, for example) can take care of these differences and could be a welcome trade-off in having to run your own machines.

What I am surprised about is the variation (or lack thereof) of all but the small instances. After I took a closer look at vmstat and top, I realized that the small instances consistently saw about 50-60% CPU stolen from it, the mediums almost always saw zero stolen, and the Large and ExtraLarges saw up to 35% CPU stolen from it during the jobs.

So, interesting.

{ 7 comments }

Like lots of operations people, we’re quite addicted to data pr0n here at Flickr. We’ve got graphs for pretty much everything, and add graphs all of the time. We’ve blogged about some of how and why we do it.

One thing we’re in the habit of is screenshotting these graphs when things go wrong, right, or indifferent, and adding them to a group on Flickr. I’ve decided to make a public group for these sort of screenshots, for anyone to contribute to:

http://flickr.com/groups/webopsviz/

You should realize before posting anything here, that you might want to think about if you want everyone in the world to see what you’ve got. I’ve made a quick FAQ on the groups page, but I’ll repeat it here:

Q: What is this?
A: This group is for sharing visualizations of web operations metrics. For the most part, this means graphs of systems and application metrics, from software like ganglia, cacti, hyperic, etc.

Q:Who gets to see this?
A: This is a semi-public group, so don’t post anything you don’t want others to see.
For now, it’ll be for members-only to post and view. Ideally, I think it’d be great to share some of these things publicly.

Q: What’s interesting to post here?
A: Spikes, dips, patterns. Things with colors. Shiny things. Donuts. Ponies.

Q: My company will fire me if I show our metrics!
A: Don’t be dense, and post your pageview, revenue, or other super-secret stuff that you think would be sensitive. Your mileage may vary.

So: you’ve got something to brag about? How many requests per second can your awesome new solid-state-disk database do? You got spikes? Post them!

{ 0 comments }

Slides from Velocity

June 25, 2008

Here are the slides from my talk at the Velocity Conference.

Read the full article →

Squid patch for making “time” stats more meaningful.

May 22, 2008

Thanks to Mark, squid’s got a patch I’ve been wanting for a gazillion years: time-to-serve statistics that don’t include the client’s location
http://www.squid-cache.org/bugs/show_bug.cgi?id=2345
Normally, squid’s kept statistics that included the “time” to serve an object, whether it be a HIT, MISS, NEAR HIT, etc. The clock starts for this time when the first headers are received by [...]

Read the full article →

Flickr’s hiring a dba.

January 30, 2008

(Only hardworking supernerds should apply)
We’re looking for an experienced and motivated MySQL DBA to help make things go at Flickr.
Stuff you’ll do:
• Work with engineers on performance tuning, query optimization, index tuning.
• Monitor databases for problems and to diagnose where those problems are.
• Work with developers and operations to maintain a scalable, reliable, and robust [...]

Read the full article →

Making a site faster by removing machines

August 20, 2007

(well, not really)
A little while ago, in one of our clusters we replaced a boatload of PowerEdge 1425 webserver-class boxes with a much smaller number of HP DL145 G3 quad-core boxes, getting the same amount of oomph from 1/3 the boxes. Not too bad.

Read the full article →

Varnish and the state of web caching

December 16, 2006

So there’s lots of excitement around Varnish, which is a caching proxy that is built to be first and foremost a reverse-proxy, as opposed to squid, which does both forward and reverse. Acceleration (reverse-proxying) is obviously important to us at Flickr, as we use squid extensively.

Read the full article →

Hats and beards

December 12, 2006

http://flickr.com/photos/allspaw/311471361/

Read the full article →