<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Kitchen Soap &#187; Tools</title>
	<atom:link href="http://www.kitchensoap.com/category/tools/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.kitchensoap.com</link>
	<description>Thoughts on capacity planning and web operations.</description>
	<lastBuildDate>Fri, 25 Jun 2010 04:05:16 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Meanwhile: More Meta-Metrics</title>
		<link>http://www.kitchensoap.com/2009/10/05/meanwhile-more-meta-metrics/</link>
		<comments>http://www.kitchensoap.com/2009/10/05/meanwhile-more-meta-metrics/#comments</comments>
		<pubDate>Mon, 05 Oct 2009 17:50:26 +0000</pubDate>
		<dc:creator>allspaw</dc:creator>
				<category><![CDATA[Tools]]></category>
		<category><![CDATA[WebOps]]></category>

		<guid isPermaLink="false">http://www.kitchensoap.com/?p=292</guid>
		<description><![CDATA[Like all sane web organizations, we gather metrics about our infrastructure and applications. As many metrics as we can, as often as we can. These metrics, given the right context, helps us figure out all sorts of things about our application, infrastructure, processes, and business. Things such as&#8230;
What:
&#8230;did we do before (historical trending, etc)
&#8230;is going [...]]]></description>
			<content:encoded><![CDATA[<p>Like all sane web organizations, we gather metrics about our infrastructure and applications. As many metrics as we can, as often as we can. These metrics, given the right context, helps us figure out all sorts of things about our application, infrastructure, processes, and business. Things such as&#8230;</p>
<p>What:</p>
<p style="padding-left: 30px;">&#8230;did we do before (historical trending, etc)<br />
&#8230;is going on right now? (troubleshooting, health, etc.)<br />
&#8230;is coming down the road (capacity planning, new feature adoption, etc.)<br />
&#8230;can we do to make things better (business intelligence, user-behavior, etc.)</p>
<p>All of which, of course, should be considered mandatory in order to help your business increase its awesome. Yay metrics!</p>
<p>Some time ago, Matthias wrote great a <a title="Agile Web Operations" href="http://www.agileweboperations.com/visible-ops-continuous-improvement/" target="_blank">blog post</a> about some of the metrics that can reasonably profile the effectiveness of web operations, taken from the <a title="VisibleOps" href="http://www.itpi.org/home/visibleops.php" target="_blank">ITIL primer, VisibleOps</a>.</p>
<p>In my opinion, there&#8217;s nothing on that list of things that isn&#8217;t valuable, as long as the cost of gathering those metrics isn&#8217;t too behaviorally, technically, or organizationally expensive. The topics included in that list of metrics and the context they live in is fodder for many, many blog posts.</p>
<p>But in the category of historical trending, I&#8217;m more and more fascinated by gathering what I&#8217;ll call &#8220;meta-metrics&#8221;, which is data about how you respond to the changes your system is experiencing.</p>
<p>One of the best examples of this is gathering information about operational disruptions. Collecting information about how many times your on-call rotation was alerted/paged/woken-up, during what times, and for what service(s) can be enlightening to say the least.  We&#8217;ve been tracking the volume of alerts a lot closer recently, and even with the level of automation we&#8217;ve got at Flickr, it&#8217;s still something you have to keep on top of, especially if you&#8217;re always finding new things to measure and alert on.</p>
<p>Now ideally, you have an alerting system that only communicates conditions that need resolvable action by a human. Which means every alert is critically important, and you&#8217;re not ignoring or dismissing any pages for any reasons that sound like <em>&#8220;oh, that&#8217;s ok, that cluster always does that&#8230;it&#8217;ll clear up, I&#8217;ll just acknowledge the page so I can shut up nagios.&#8221;</em> In other words, our goal is to have a zero-noise alerting system. Which means that <em>all</em> alerts are actionable, not ignorable, and require a human to troubleshoot or fix. Over time, you push as much of this work as you can to the robots. In the meantime, save humans for the yet-to-be-automated work, or the stuff that isn&#8217;t easily captured by robots.</p>
<p>Why is this important to us? I may be stating the obvious, but it&#8217;s because interrupting humans with alerts that don&#8217;t require action has a mental and physical context switching cost (especially if the guy on-call was sleeping), and it increases the likelihood of missing a truly critical page in a slew of non-critical ones.</p>
<p>Of course in the reality of evolving and growing web applications, even if we could reach a 100% noise-free alerting system, it&#8217;s impossible to sustain for any extended period of time, because your application, usage, and failure modes are constantly changing. So in the meantime, knowing how your alerts affect the team is a worthwhile thing to do for us. In fact, I think it&#8217;s so important that it&#8217;s worth collecting and displaying next to the rest of your metrics, and exposing these metrics to the entire dev and ops groups.</p>
<p>Something like this: (made-up numbers)</p>
<div id="attachment_295" class="wp-caption alignnone" style="width: 300px">
	<a href="http://www.kitchensoap.com/wp-content/uploads/2009/10/Alerts-Mockup.png"><img class="size-medium wp-image-295" title="Tracking Critical Alerts" src="http://www.kitchensoap.com/wp-content/uploads/2009/10/Alerts-Mockup-300x206.png" alt="Tracking Critical Alerts " width="300" height="206" /></a>
	<p class="wp-caption-text">Tracking Critical Alerts </p>
</div>
<p>Gathering up info about these alerts should give us a better perspective on where we can improve. So, things like:</p>
<ul>
<li> How many critical alerts are sent on a daily/hourly/weekly basis?</li>
<li> What does a time histogram of the alerts look like? Do you get more or less alerts during nighttime or non-peak hours?</li>
<li>How much (if any) correlation is there between critical alerts and:</li>
</ul>
<blockquote style="padding-left: 30px;"><p>- code deploys?<br />
- software upgrades?<br />
- feature launches?<br />
- open API abuse?</p></blockquote>
<ul>
<li> What does a breakdown of the alerts look like, in terms of: host type, service type, and frequency of each in a given time period?</li>
</ul>
<p>and maybe the most important ones:</p>
<ul>
<li> How many of those alerts aren&#8217;t actually critical or demand human attention?</li>
<li> How many of them always self-recover?</li>
<li> How many (and which) don&#8217;t matter in their role context (like, a single node in a load-balanced cluster) and could be turned into an aggregate check?</li>
</ul>
<p>We&#8217;ve built our own stuff to track and analyze these things. My question to the community is: I&#8217;m not aware of any open-source tool that is dedicated to analyzing these metrics. Do they exist? Nagios obviously has host/hostgroup/cluster warning and critical histories, and those can be crunched to find critical alert statistics, but I&#8217;m not aware of any comprehensive crunching. Of course, until I find one, we&#8217;re just building our own.</p>
<p>Thoughts, lazyweb?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.kitchensoap.com/2009/10/05/meanwhile-more-meta-metrics/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Uncaching bits in filesystem cache</title>
		<link>http://www.kitchensoap.com/2009/07/09/uncaching-bits-in-filesystem-cache/</link>
		<comments>http://www.kitchensoap.com/2009/07/09/uncaching-bits-in-filesystem-cache/#comments</comments>
		<pubDate>Thu, 09 Jul 2009 18:17:26 +0000</pubDate>
		<dc:creator>allspaw</dc:creator>
				<category><![CDATA[Random]]></category>
		<category><![CDATA[Tools]]></category>

		<guid isPermaLink="false">http://www.kitchensoap.com/?p=263</guid>
		<description><![CDATA[Domas makes something more useful than I bet most would think: http://mituzas.lt/2009/06/26/uncache/
]]></description>
			<content:encoded><![CDATA[<p>Domas makes something more useful than I bet most would think: <a href="http://mituzas.lt/2009/06/26/uncache/" target="_blank">http://mituzas.lt/2009/06/26/uncache/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.kitchensoap.com/2009/07/09/uncaching-bits-in-filesystem-cache/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Slides for Velocity Talk 2009</title>
		<link>http://www.kitchensoap.com/2009/06/23/slides-for-velocity-talk-2009/</link>
		<comments>http://www.kitchensoap.com/2009/06/23/slides-for-velocity-talk-2009/#comments</comments>
		<pubDate>Tue, 23 Jun 2009 23:39:53 +0000</pubDate>
		<dc:creator>allspaw</dc:creator>
				<category><![CDATA[Culture]]></category>
		<category><![CDATA[Slides]]></category>
		<category><![CDATA[Talks]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[WebOps]]></category>
		<category><![CDATA[velocity conference]]></category>
		<category><![CDATA[Web Ops]]></category>

		<guid isPermaLink="false">http://www.kitchensoap.com/?p=257</guid>
		<description><![CDATA[UPDATE: blip.tv has the video of the talk as well, below. Jeez I have some major bed-head.
That was a blast! I had never done a &#8216;duet&#8217; talk before. Here are the slides:
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
&#8230;and the video of it is here:

]]></description>
			<content:encoded><![CDATA[<p>UPDATE: blip.tv has the video of the talk as well, below. Jeez I have some major bed-head.</p>
<p>That was a blast! I had never done a &#8216;duet&#8217; talk before. Here are the slides:</p>
<div id="__ss_1628368" style="width: 425px; text-align: left;"><a style="font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;" title="10+ Deploys Per Day: Dev and Ops Cooperation at Flickr" href="http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr?type=presentation">10+ Deploys Per Day: Dev and Ops Cooperation at Flickr</a><object style="margin:0px" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=allspawhammondvelocity2009-090623161942-phpapp01&amp;stripped_title=10-deploys-per-day-dev-and-ops-cooperation-at-flickr" /><param name="allowfullscreen" value="true" /><embed style="margin:0px" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=allspawhammondvelocity2009-090623161942-phpapp01&amp;stripped_title=10-deploys-per-day-dev-and-ops-cooperation-at-flickr" allowscriptaccess="always" allowfullscreen="true"></embed></object></div>
<div style="width: 425px; text-align: left;">&#8230;and the video of it is here:</div>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="640" height="390" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://blip.tv/play/AYGMoH+LqzQ" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="640" height="390" src="http://blip.tv/play/AYGMoH+LqzQ" allowfullscreen="true"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://www.kitchensoap.com/2009/06/23/slides-for-velocity-talk-2009/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Slides from Web2.0 Expo 2009. (and somethin else interestin&#8217;)</title>
		<link>http://www.kitchensoap.com/2009/04/03/slides-from-web20-expo-2009-and-somethin-else-interestin/</link>
		<comments>http://www.kitchensoap.com/2009/04/03/slides-from-web20-expo-2009-and-somethin-else-interestin/#comments</comments>
		<pubDate>Fri, 03 Apr 2009 21:21:40 +0000</pubDate>
		<dc:creator>allspaw</dc:creator>
				<category><![CDATA[Flickr]]></category>
		<category><![CDATA[Slides]]></category>
		<category><![CDATA[Talks]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[WebOps]]></category>

		<guid isPermaLink="false">http://www.kitchensoap.com/?p=115</guid>
		<description><![CDATA[That was a pretty good time. Saw lots of good and wicked smaht people, and I got a lot of great questions after my talk. The slides are up on slideshare, and here are the PDF slides. 
Operational Efficiency Hacks Web20 Expo2009
View more presentations from John Allspaw.

UPDATE: Gil Raphaelli has posted his python bindings he [...]]]></description>
			<content:encoded><![CDATA[<p>That was a pretty good time. Saw lots of good and wicked smaht people, and I got a lot of great questions after my talk. The slides are up on <a href="http://www.slideshare.net/jallspaw/operational-efficiency-hacks-web20-expo2009" target="_blank">slideshare</a>, and here are the <a title="Operational Efficiency Hacks Web 2.0 Expo 2009" href="http://kitchensoap.com/talks/OpsHacksWeb20Expo2009-Notes.pdf" target="_blank">PDF slides</a>. <strong><em></em></strong></p>
<div style="width:425px;text-align:left" id="__ss_1245887"><a style="font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;" href="http://www.slideshare.net/jallspaw/operational-efficiency-hacks-web20-expo2009?type=presentation" title="Operational Efficiency Hacks Web20 Expo2009">Operational Efficiency Hacks Web20 Expo2009</a><object style="margin:0px" width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=opshacksweb20expo2009-090403152449-phpapp02&#038;stripped_title=operational-efficiency-hacks-web20-expo2009" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=opshacksweb20expo2009-090403152449-phpapp02&#038;stripped_title=operational-efficiency-hacks-web20-expo2009" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object>
<div style="font-size:11px;font-family:tahoma,arial;height:26px;padding-top:2px;">View more <a style="text-decoration:underline;" href="http://www.slideshare.net/">presentations</a> from <a style="text-decoration:underline;" href="http://www.slideshare.net/jallspaw">John Allspaw</a>.</div>
</div>
<p><strong><em>UPDATE:</em></strong> Gil Raphaelli has <a href="http://g.raphaelli.com/2009/4/2/libyahoo2-python-bindings" target="_blank">posted</a> his python bindings he wrote for our libyahoo2 use in our Ops IM Bot.</p>
<p>There <em>was</em> something that I left out of my slides, mostly because I didn&#8217;t want to distract from the main topic, which was optimization and efficiencies.</p>
<p>While I used our image processing capacity at Flickr as an example of how compilers and hardware can have some significant influence on how fast or efficient you can run, I had wondered what the Magical Cloud™ would do with these differences.</p>
<p>So I took the tests I ran on our own machines and ran them on Small, Medium, Large, Extra Large, and Extra Large(High) instances of EC2, to see. The results were a bit surprising to me, but I&#8217;m sure not surprising to anyone who uses EC2 with any significant amount of CPU demand.</p>
<p>For the testing, I have a script that does some super simple image resizing with GraphicsMagick. It splits a DSLR photo into 6 different sizes, much in the same way that we do at Flickr for the real world. It does that resizing on about 7 different files, and I timed them all. This is with the most recent version of GraphicsMagick, 1.3.5, with the awesome OpenMP bits in it.</p>
<p>Here is the slide of the tests run on different (increasingly faster) dedicated machines:</p>
<p style="text-align: center;"><img class="size-medium wp-image-117 aligncenter" title="Faster Image Processing Hardware" src="http://www.kitchensoap.com/wp-content/uploads/2009/04/gm-hardware2-300x213.png" alt="Faster Image Processing Hardware" width="300" height="213" /></p>
<p>and here is the slide that I <em>didn&#8217;t</em> include, of the EC2 timings of the same test:</p>
<p style="text-align: center;"><img class="size-medium wp-image-118 aligncenter" title="Image Processing on EC2" src="http://www.kitchensoap.com/wp-content/uploads/2009/04/gm-ec2-300x213.png" alt="Image Processing on EC2" width="300" height="213" /></p>
<p>Now I&#8217;m not suggesting that the two graphs <strong><em>should</em></strong> look similar, or that EC2 <em>should</em> be faster. I&#8217;m well aware of the shift in perspective when deploying capacity within the cloud versus within your own data center. So I&#8217;m not surprised that the fastest test results are on the order of 2x slower on EC2. Application logic, feature designs (synchronous versus asynchronous image processing, for example) can take care of these differences and could be a welcome trade-off in having to run your own machines.</p>
<p>What I am surprised about is the variation (or lack thereof) of all but the small instances. After I took a closer look at vmstat and top, I realized that the small instances consistently saw about 50-60% <a href="http://help.rightscale.com/cgi-bin/rightscale.cfg/php/enduser/std_adp.php?p_faqid=28" target="_blank">CPU stolen</a> from it, the mediums almost always saw zero stolen, and the Large and ExtraLarges saw up to 35% CPU stolen from it during the jobs.</p>
<p>So, interesting.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.kitchensoap.com/2009/04/03/slides-from-web20-expo-2009-and-somethin-else-interestin/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Speaking at Web2.0 Expo 2009</title>
		<link>http://www.kitchensoap.com/2009/02/19/speaking-at-web20-expo-2009/</link>
		<comments>http://www.kitchensoap.com/2009/02/19/speaking-at-web20-expo-2009/#comments</comments>
		<pubDate>Fri, 20 Feb 2009 01:22:43 +0000</pubDate>
		<dc:creator>allspaw</dc:creator>
				<category><![CDATA[Capacity Planning]]></category>
		<category><![CDATA[Talks]]></category>
		<category><![CDATA[Tools]]></category>

		<guid isPermaLink="false">http://www.kitchensoap.com/?p=106</guid>
		<description><![CDATA[Looks like I&#8217;m gonna talk about even more nerdy things at the Web2.0 Expo in April.


You don’t have to wait for a recession to tighten up your operations. Squeezing more oomph out of your servers (or instances!) is always a good thing, and streamlining how you handle site issues is too. We’ll will talk about [...]]]></description>
			<content:encoded><![CDATA[<p>Looks like I&#8217;m gonna talk about even more nerdy things at the <a href="http://www.web2expo.com/webexsf2009/public/schedule/detail/8580" target="_blank">Web2.0 Expo in April.</a></p>
<blockquote>
<div class="en_session_description description">
<p>You don’t have to wait for a recession to tighten up your operations. Squeezing more oomph out of your servers (or instances!) is always a good thing, and streamlining how you handle site issues is too. We’ll will talk about what we’ve been doing at Flickr to get more out of less from both our machines and our humans.</p>
<p>Capacity Hacks: diagonal scaling, tuning opportunities, and some other stupid performance tricks.</p>
<p>Ops “runbook” Hacks: Server and process self-healing, application-level measurement, ops communication tools, and some worst-case scenario tricks to have in your back pocket.</p></div>
<h4></h4>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.kitchensoap.com/2009/02/19/speaking-at-web20-expo-2009/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Web Ops Visualizations Group on Flickr</title>
		<link>http://www.kitchensoap.com/2008/12/16/web-ops-visualizations-group-on-flickr/</link>
		<comments>http://www.kitchensoap.com/2008/12/16/web-ops-visualizations-group-on-flickr/#comments</comments>
		<pubDate>Tue, 16 Dec 2008 18:19:10 +0000</pubDate>
		<dc:creator>allspaw</dc:creator>
				<category><![CDATA[Flickr]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[Web Ops]]></category>

		<guid isPermaLink="false">http://www.kitchensoap.com/?p=79</guid>
		<description><![CDATA[Like lots of operations people, we&#8217;re quite addicted to data pr0n here at Flickr. We&#8217;ve got graphs for pretty much everything, and add graphs all of the time. We&#8217;ve blogged about some of how and why we do it.
One thing we&#8217;re in the habit of is screenshotting these graphs when things go wrong, right, or [...]]]></description>
			<content:encoded><![CDATA[<p>Like lots of operations people, we&#8217;re quite addicted to data pr0n here at Flickr. We&#8217;ve got graphs for pretty much everything, and add graphs all of the time. We&#8217;ve <a href="http://code.flickr.com/blog/2008/10/27/counting-timing/" target="_blank">blogged</a> <a href="http://code.flickr.com/blog/2008/10/13/flickr-digs-ganglia/" target="_blank">about</a> some of how and why we do it.</p>
<p>One thing we&#8217;re in the habit of is screenshotting these graphs when things go wrong, right, or indifferent, and adding them to a group on Flickr. I&#8217;ve decided to make a public group for these sort of screenshots, for anyone to contribute to:</p>
<p style="text-align: center;"><a href="http://flickr.com/groups/webopsviz/" target="_blank">http://flickr.com/groups/webopsviz/</a></p>
<p>You should realize before posting anything here, that you might want to think about if you want everyone in the world to see what you&#8217;ve got. I&#8217;ve made a quick FAQ on the groups page, but I&#8217;ll repeat it here:</p>
<blockquote><p><strong>Q: What is this?</strong><br />
A: This group is for sharing visualizations of web operations metrics. For the most part, this means graphs of systems and application metrics, from software like ganglia, cacti, hyperic, etc.</p>
<p><strong>Q:Who gets to see this?</strong><br />
A: This is a semi-public group, so don&#8217;t post anything you don&#8217;t want others to see.<br />
For now, it&#8217;ll be for members-only to post and view.  Ideally, I think it&#8217;d be great to share some of these things publicly.</p>
<p><strong>Q: What&#8217;s interesting to post here?</strong><br />
A: Spikes, dips, patterns. Things with colors. Shiny things. Donuts. Ponies.</p>
<p><strong>Q: My company will fire me if I show our metrics!</strong><br />
A: Don&#8217;t be dense, and post your pageview, revenue, or other super-secret stuff that you think would be sensitive. Your mileage may vary.</p></blockquote>
<p>So: you&#8217;ve got something to brag about? How many requests per second can your awesome new solid-state-disk database do? You got spikes? Post them!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.kitchensoap.com/2008/12/16/web-ops-visualizations-group-on-flickr/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Code Swarm for Config Management</title>
		<link>http://www.kitchensoap.com/2008/10/21/code-swarm-for-config-management/</link>
		<comments>http://www.kitchensoap.com/2008/10/21/code-swarm-for-config-management/#comments</comments>
		<pubDate>Wed, 22 Oct 2008 01:46:52 +0000</pubDate>
		<dc:creator>allspaw</dc:creator>
				<category><![CDATA[Tools]]></category>
		<category><![CDATA[WebOps]]></category>
		<category><![CDATA["automated infrastructure" webops tools config management]]></category>

		<guid isPermaLink="false">http://www.kitchensoap.com/?p=65</guid>
		<description><![CDATA[Gil Raphaelli, one of the guys on our Flickr Ops team, put together a Code Swarm animation for the configuration/deployment management tool we use at Flickr to manage our infrastructure. Myles Grant did this for our bug reporting system as well. Check it out:

Our automated config management system is called Gemstone, but conceptually you can [...]]]></description>
			<content:encoded><![CDATA[<p>Gil Raphaelli, one of the guys on our Flickr Ops team, put together a <a href="http://vis.cs.ucdavis.edu/~ogawa/codeswarm/" target="_blank">Code Swarm</a> animation for the configuration/deployment management tool we use at Flickr to manage our infrastructure. Myles Grant did this for our <a href="http://flickr.com/photos/mylesdgrant/2610882541/" target="_blank">bug reporting system</a> as well. Check it out:</p>
<p><object type="application/x-shockwave-flash" width="500" height="375" data="http://www.flickr.com/apps/video/stewart.swf?v=61761" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000"><param name="flashvars" value="intl_lang=en-us&amp;photo_secret=ff23fc7881&amp;photo_id=2920452511&amp;show_info_box=true"></param><param name="movie" value="http://www.flickr.com/apps/video/stewart.swf?v=61761"></param><param name="bgcolor" value="#000000"></param><param name="allowFullScreen" value="true"></param><embed type="application/x-shockwave-flash" src="http://www.flickr.com/apps/video/stewart.swf?v=61761" bgcolor="#000000" allowfullscreen="true" flashvars="intl_lang=en-us&amp;photo_secret=ff23fc7881&amp;photo_id=2920452511&amp;flickr_show_info_box=true" height="375" width="500"></embed></object></p>
<p>Our automated config management system is called Gemstone, but conceptually you can think of it as a pretty extensible SystemImager/Puppet/cfengine-style system. In the animation, the dots are changes made by the ops person shown.  The legend is:<br />
<em><br />
<strong>transforms</strong></em>: this is what cluster should have what packages, files, actionable scripts, etc.<br />
<em><strong>raw</strong>:</em> these are actual files, like apache/memcached/squid configs, which get munged depending on what cluster they might be in<br />
<em><strong>conf</strong>:</em> this is what boxes/clusters are subsets or supersets of which clusters<br />
<strong><em>code</em></strong>: ops-written tools/utilities<br />
<strong><em>Misc:</em></strong> stuff that doesn&#8217;t fit into the above. <img src='http://www.kitchensoap.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.kitchensoap.com/2008/10/21/code-swarm-for-config-management/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Why we use GraphicsMagick</title>
		<link>http://www.kitchensoap.com/2008/09/02/why-we-use-graphicsmagick/</link>
		<comments>http://www.kitchensoap.com/2008/09/02/why-we-use-graphicsmagick/#comments</comments>
		<pubDate>Tue, 02 Sep 2008 15:47:28 +0000</pubDate>
		<dc:creator>allspaw</dc:creator>
				<category><![CDATA[Tools]]></category>
		<category><![CDATA[graphicsmagick image]]></category>

		<guid isPermaLink="false">http://www.kitchensoap.com/?p=51</guid>
		<description><![CDATA[Speed: http://www.graphicsmagick.org/www/BENCHMARKS.html
Also, it looks like the GM devs are working on getting OpenMP (parallelism) put into GM processing, which will be a huge boom for multicore boxes. Yay!
]]></description>
			<content:encoded><![CDATA[<p>Speed: <a href="http://www.graphicsmagick.org/www/BENCHMARKS.html" target="_blank">http://www.graphicsmagick.org/www/BENCHMARKS.html</a></p>
<p>Also, it looks like the GM devs are working on getting OpenMP (parallelism) put into GM processing, which will be a huge boom for multicore boxes. Yay!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.kitchensoap.com/2008/09/02/why-we-use-graphicsmagick/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Tool update: WTF is inside filesystem cache ?</title>
		<link>http://www.kitchensoap.com/2008/03/27/tool-update-wtf-is-inside-filesystem-cache/</link>
		<comments>http://www.kitchensoap.com/2008/03/27/tool-update-wtf-is-inside-filesystem-cache/#comments</comments>
		<pubDate>Thu, 27 Mar 2008 13:04:14 +0000</pubDate>
		<dc:creator>allspaw</dc:creator>
				<category><![CDATA[Caching]]></category>
		<category><![CDATA[Random]]></category>
		<category><![CDATA[Tools]]></category>

		<guid isPermaLink="false">http://www.kitchensoap.com/2008/03/27/tool-update-wtf-is-inside-filesystem-cache/</guid>
		<description><![CDATA[Awhile back, I said I&#8217;d love to have a tool that would allow me to peek inside filesystem cache and tell me what files (or pages of files) are inside. Well Peter Zaitsev points to the fincore tool, which comes pretty damn close: you give it a file, and it will tell you which pages [...]]]></description>
			<content:encoded><![CDATA[<p>Awhile <a href="http://www.kitchensoap.com/2007/01/26/two-tools-that-i-would-love-more-than-anything/" target="_blank">back</a>, I said I&#8217;d love to have a tool that would allow me to peek inside filesystem cache and tell me what files (or pages of files) are inside. Well Peter Zaitsev <a href="http://www.mysqlperformanceblog.com/2008/03/18/the-tool-ive-been-waiting-for-years/" target="_blank">points</a> to the <a href="http://net.doit.wisc.edu/~plonka/fincore/" target="_blank">fincore</a> tool, which comes pretty damn close: you give it a file, and it will tell you which pages of a particular file are in core memory.</p>
<p>Rock. Thanks, David Plonka.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.kitchensoap.com/2008/03/27/tool-update-wtf-is-inside-filesystem-cache/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>WebOps Communication Tools</title>
		<link>http://www.kitchensoap.com/2008/03/10/webops-communication-tools/</link>
		<comments>http://www.kitchensoap.com/2008/03/10/webops-communication-tools/#comments</comments>
		<pubDate>Mon, 10 Mar 2008 13:36:44 +0000</pubDate>
		<dc:creator>allspaw</dc:creator>
				<category><![CDATA[Tools]]></category>
		<category><![CDATA[WebOps]]></category>

		<guid isPermaLink="false">http://www.kitchensoap.com/2008/03/10/webops-communication-tools/</guid>
		<description><![CDATA[After seeing Jesse&#8217;s great post on Radar (never knew about FreeConferenceCall, very cool!) about the quick and easy webops event communications, I thought I might put a post together on some of what we&#8217;re using at Flickr to keep track of things ops-related.
Production Changes/Immediate Issues
We have our configuration management schemes wrapped up in version control, [...]]]></description>
			<content:encoded><![CDATA[<p>After seeing Jesse&#8217;s great <a href="http://radar.oreilly.com/archives/2008/03/webops-paging-notification-conference-call-bridge-operations-velocity.html" target="_blank">post</a> on Radar (never knew about <a href="http://www.freeconferencecall.com/" target="_blank">FreeConferenceCall</a>, very cool!) about the quick and easy webops event communications, I thought I might put a post together on some of what we&#8217;re using at Flickr to keep track of things ops-related.</p>
<p><strong>Production Changes/Immediate Issues</strong></p>
<p>We have our configuration management schemes wrapped up in version control, so we can track changes there easily. But sometimes we affect production machines not relating to configuration changes that need to be communicated to everyone on the team.</p>
<p>We have an IM bot that everyone is a contact of, and when we make any changes, we simply send the bot an IM which will get parroted to everyone on the team as it happens, and logged in a text file with the IM name and timestamp, which we also serve as a webpage, wrapped up in the nice <a href="http://developer.yahoo.com/yui/" target="_blank">YUI</a> bits for easy sorting. If you&#8217;re offline, then you&#8217;ll get the bot&#8217;s messages when you log on again.</p>
<p>Examples of this would be taking boxes in and out of a load-balanced pool, restarting apache or squid or MySQL, or even one-off A/B testing of any temporary (kernel or application) paramaters that may or may not raise a red flag.</p>
<p><strong>Ongoing Work</strong></p>
<p>Because we like to keep logs, and because we have some guys on the team working remotely, we keep a running commentary on an internal IRC server. We use this for basic day-to-day work. Running it under a screen means that we have weeks, months, and even years of ops conversations about what we&#8217;ve done in the scrollback, and in the irc logs.</p>
<p><strong>Code Deployment</strong></p>
<p>Of course <a href="http://iamcal.com" target="_blank">Cal</a> and the other devs have a system in place that logs all new code deploys with a username and a timestamp, and a diff to the version control bits that show what changed in that deploy, which makes things easy to correlate system and application metrics with changes made to the codebase. (thank you)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.kitchensoap.com/2008/03/10/webops-communication-tools/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.361 seconds -->
