Kitchen Soap

Thoughts on capacity planning and web operations.

Kitchen Soap header image 1

Slides from Web 2.0 Expo

April 29th, 2008 · 3 Comments

Here they are.

→ 3 CommentsTags: "capacity planning" · talks

Tool update: WTF is inside filesystem cache ?

March 27th, 2008 · 1 Comment

Awhile back, I said I’d love to have a tool that would allow me to peek inside filesystem cache and tell me what files (or pages of files) are inside. Well Peter Zaitsev points to the fincore tool, which comes pretty damn close: you give it a file, and it will tell you which pages of a particular file are in core memory.

Rock. Thanks, David Plonka.

→ 1 CommentTags: caching · random · tools

WebOps Communication Tools

March 10th, 2008 · 3 Comments

After seeing Jesse’s great post on Radar (never knew about FreeConferenceCall, very cool!) about the quick and easy webops event communications, I thought I might put a post together on some of what we’re using at Flickr to keep track of things ops-related.

Production Changes/Immediate Issues

We have our configuration management schemes wrapped up in version control, so we can track changes there easily. But sometimes we affect production machines not relating to configuration changes that need to be communicated to everyone on the team.

We have an IM bot that everyone is a contact of, and when we make any changes, we simply send the bot an IM which will get parroted to everyone on the team as it happens, and logged in a text file with the IM name and timestamp, which we also serve as a webpage, wrapped up in the nice YUI bits for easy sorting. If you’re offline, then you’ll get the bot’s messages when you log on again.

Examples of this would be taking boxes in and out of a load-balanced pool, restarting apache or squid or MySQL, or even one-off A/B testing of any temporary (kernel or application) paramaters that may or may not raise a red flag.

Ongoing Work

Because we like to keep logs, and because we have some guys on the team working remotely, we keep a running commentary on an internal IRC server. We use this for basic day-to-day work. Running it under a screen means that we have weeks, months, and even years of ops conversations about what we’ve done in the scrollback, and in the irc logs.

Code Deployment

Of course Cal and the other devs have a system in place that logs all new code deploys with a username and a timestamp, and a diff to the version control bits that show what changed in that deploy, which makes things easy to correlate system and application metrics with changes made to the codebase. (thank you)

→ 3 CommentsTags: tools · webops

Too big to use utility computing ?

February 27th, 2008 · 11 Comments

Dear users of S3, EC2, and other ‘utility’ computing stuffs:

Here’s a crude and completely oversimplified evolution of infrastructure needs of a growing website, with an assumption:

Evolution of web infrastructure

Have you ‘outgrown’ your original use of utility computing, for whatever reason ? If so, what was the reason? Financial? Technical?

Why I’m asking:

I’m in the process of writing a book on the topic of capacity planning for web architectures, so I’m interested in what you’ve got to say.

→ 11 CommentsTags: "capacity planning" · webops

Datacenter Operating Systems

February 20th, 2008 · No Comments

I’m probably late in getting to this, but seeing the article in the WSJ about the RAD project made me stop to take a look. It appears to be a collection of different projects, all relating to infrastructure deployment/management and various research topics surrounding it. Looks cool so far.

→ No CommentsTags: "capacity planning" · random

Loving Dashboard Spy.

February 17th, 2008 · 4 Comments

I’m probably very late to this party, but I just discovered Dashboard Spy. Given the amount of “data porn” that folks in webops look at on a daily basis, this sort of stuff is pretty damn interesting.

I’m especially loving the current trend of developing ‘business’ dashboards, since it can fit in quite nicely with infrastructure statistics. Quite often when I need to make capacity justifications, I pull forecasts from both the higher-level metrics (i.e. photos uploaded) and the lower-level metrics (i.e. disk space consumed by photos) and have to marry those two bits together.

In fact, I love that stuff so much that I’m writing a book about it. :)

→ 4 CommentsTags: "capacity planning" · webops

Flickr’s hiring a dba.

January 30th, 2008 · 4 Comments

(Only hardworking supernerds should apply)

We’re looking for an experienced and motivated MySQL DBA to help make things go at Flickr.

Stuff you’ll do:
• Work with engineers on performance tuning, query optimization, index tuning.
• Monitor databases for problems and to diagnose where those problems are.
• Work with developers and operations to maintain a scalable, reliable, and robust database environment.
• Build database tools and scripts to automate where possible.
• Support MySQL databases for production and development.
• Provide 24×7 escalated on-call support on a pager rotation.

Smarts and experience you’ll need:
• 3-4+ years MySQL experience.
• 2+ years of experience as a MySQL DBA in a high traffic, transactional environment.
• 2+ years working in a LAMP environment, particularly PHP/MySQL
• Proficient with database performance strategies.
• Proficient tuning MySQL processes and queries.
• Experience in administration of InnoDB
• Experience with MySQL Replication, with both Master-Slave and Master-Master replication.
• Ability to work cooperatively with software engineers and system administrators.
• Excellent communication skills
• Exceptional problem-solving expertise and attention to detail.
• BS in Computer Science or equivalent.

Super Nerdy Bonus Points For:
• Experience with Data Sharding and federated architectures.
• Experience with multi-datacenter MySQL replication.
• Experience working in a social media environment.

Ok ? Now, send me your resume!

→ 4 CommentsTags: flickr · webops

Speaking at Web 2.0 Expo 2008

January 3rd, 2008 · 3 Comments

I’m gonna give a talk in capacity planning for web operations at the Web 2.0 Expo in April. Wondering if I should submit the same sort of talk for the Velocity conference in June. Don’t want to be redundant or anything.

Web 2.0 Expo San Francisco 2008

→ 3 CommentsTags: "capacity planning" · talks · webops

A new place for Web Ops to talk the talk and walk the walk

November 15th, 2007 · 1 Comment

There’s a new conference in town, and it looks to have the really good schmitz. Good work Jesse and Steve, I’m really looking forward to this.

→ 1 CommentTags: talks · webops

Datacenters can suck. Communication can be great.

November 13th, 2007 · 1 Comment

If you consider that you and your users are in some sort of a ‘relationship’, then good communication is pretty important. The Rackspace datacenter outage reminds me yet again that we’re lucky to have a handful of servers in more than one datacenter that can communicate to users in the case where we’ve lost one of them.

Desperate times call for desperate measures, and in the case where you lose a DC, having that $9.95/month webhosting account (or whatever) for serving a status/downtime/blog page somewhere else can sound like a bargain.

→ 1 CommentTags: "capacity planning" · webops