<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Meanwhile: More Meta-Metrics</title>
	<atom:link href="http://www.kitchensoap.com/2009/10/05/meanwhile-more-meta-metrics/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.kitchensoap.com/2009/10/05/meanwhile-more-meta-metrics/</link>
	<description>Thoughts on capacity planning and web operations.</description>
	<lastBuildDate>Thu, 02 Feb 2012 06:18:46 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: prasana</title>
		<link>http://www.kitchensoap.com/2009/10/05/meanwhile-more-meta-metrics/comment-page-1/#comment-7737</link>
		<dc:creator>prasana</dc:creator>
		<pubDate>Mon, 19 Oct 2009 22:40:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.kitchensoap.com/?p=292#comment-7737</guid>
		<description>@admob we have since the summer started mining nagios alert logs across different dimensions,
1. application [easily done in our env because we pointed them to different contactgroups]
2. colo
as with any of this data visualization [and good one at that] is a worthwhile investment and a has yielded for us the result that we do get to focus on things besides the high runners only.

great post and really like your take on the metrics as a way of doing the &quot;showme&quot; and leading to a better engineered system. thanks and keep up the good work.</description>
		<content:encoded><![CDATA[<p>@admob we have since the summer started mining nagios alert logs across different dimensions,<br />
1. application [easily done in our env because we pointed them to different contactgroups]<br />
2. colo<br />
as with any of this data visualization [and good one at that] is a worthwhile investment and a has yielded for us the result that we do get to focus on things besides the high runners only.</p>
<p>great post and really like your take on the metrics as a way of doing the &#8220;showme&#8221; and leading to a better engineered system. thanks and keep up the good work.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daily Links #111 &#124; CloudKnow</title>
		<link>http://www.kitchensoap.com/2009/10/05/meanwhile-more-meta-metrics/comment-page-1/#comment-7727</link>
		<dc:creator>Daily Links #111 &#124; CloudKnow</dc:creator>
		<pubDate>Sat, 10 Oct 2009 16:58:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.kitchensoap.com/?p=292#comment-7727</guid>
		<description>[...] John Allspaw: More Meta-Metrics [...]</description>
		<content:encoded><![CDATA[<p>[...] John Allspaw: More Meta-Metrics [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dan Ackerson</title>
		<link>http://www.kitchensoap.com/2009/10/05/meanwhile-more-meta-metrics/comment-page-1/#comment-7724</link>
		<dc:creator>Dan Ackerson</dc:creator>
		<pubDate>Wed, 07 Oct 2009 18:10:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.kitchensoap.com/?p=292#comment-7724</guid>
		<description>Thanks for the trackback, John!

It seems like we have all the fundamental tools for monitoring server problems (nagios), site outtages (pingdom) and release metrics (cruisecontrol). What we&#039;re missing is your &quot;meta-metrics&quot; layer that provides another layer of abstraction on top of these. Performing some intelligent data-mining to give us some better insights into how we are _really_ doing operational wise. 

I&#039;ve got a damn Excel pivottable in my head for some reason : cruisecontrol release metrics vs. nagios alerts vs. pingdom warnings ... now just add some Splunk trending to the mix. Very sexy tool indeed!</description>
		<content:encoded><![CDATA[<p>Thanks for the trackback, John!</p>
<p>It seems like we have all the fundamental tools for monitoring server problems (nagios), site outtages (pingdom) and release metrics (cruisecontrol). What we&#8217;re missing is your &#8220;meta-metrics&#8221; layer that provides another layer of abstraction on top of these. Performing some intelligent data-mining to give us some better insights into how we are _really_ doing operational wise. </p>
<p>I&#8217;ve got a damn Excel pivottable in my head for some reason : cruisecontrol release metrics vs. nagios alerts vs. pingdom warnings &#8230; now just add some Splunk trending to the mix. Very sexy tool indeed!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tom Cignarella</title>
		<link>http://www.kitchensoap.com/2009/10/05/meanwhile-more-meta-metrics/comment-page-1/#comment-7723</link>
		<dc:creator>Tom Cignarella</dc:creator>
		<pubDate>Tue, 06 Oct 2009 17:55:07 +0000</pubDate>
		<guid isPermaLink="false">http://www.kitchensoap.com/?p=292#comment-7723</guid>
		<description>At Clickability we do something similar where we track and publish a monthly metric on &quot;total alerts sent&quot; and &quot;number of night’s on-call was woken up&quot;. Clearly the later is critical to keeping an operations team sane and happy. We&#039;ve not gone down the path yet of further analyzing the results rather we spend our time making sure that every single alert is an actionable item that requires human intervention, If it does not we create automation or set it to not page.

I would love to be able to utilize an open source page that digs deeper in Nagios to understand alerts better. Thank you for suggesting it.</description>
		<content:encoded><![CDATA[<p>At Clickability we do something similar where we track and publish a monthly metric on &#8220;total alerts sent&#8221; and &#8220;number of night’s on-call was woken up&#8221;. Clearly the later is critical to keeping an operations team sane and happy. We&#8217;ve not gone down the path yet of further analyzing the results rather we spend our time making sure that every single alert is an actionable item that requires human intervention, If it does not we create automation or set it to not page.</p>
<p>I would love to be able to utilize an open source page that digs deeper in Nagios to understand alerts better. Thank you for suggesting it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Abe Hassan</title>
		<link>http://www.kitchensoap.com/2009/10/05/meanwhile-more-meta-metrics/comment-page-1/#comment-7722</link>
		<dc:creator>Abe Hassan</dc:creator>
		<pubDate>Mon, 05 Oct 2009 20:54:28 +0000</pubDate>
		<guid isPermaLink="false">http://www.kitchensoap.com/?p=292#comment-7722</guid>
		<description>At Six Apart we&#039;ve put together a bunch of scripts that analyze the Nagios logs for alert notifications; we have a couple twiddles to only display critical alerts, or only nighttime alerts, or both. We get a nightly report with a breakdown of top alerts, broken down by host (and then by service within that host) and a breakdown by service (and then by host -- useful for seeing if a check is broken or misconfigured, or if there was some more widespread problem).

It gives me a great view into not just what people are woken up for, but the general pulse of our systems overall.  (I&#039;d be happy to clean it up and share it with the world ... perhaps I should.) And really, at some point, something that sits in a &quot;warning&quot; state for an entire weekend -- never actually paging someone -- is just as actionable and just as important as a false positive critical alert: it either needs to be silenced or it&#039;s indicative of a real problem.</description>
		<content:encoded><![CDATA[<p>At Six Apart we&#8217;ve put together a bunch of scripts that analyze the Nagios logs for alert notifications; we have a couple twiddles to only display critical alerts, or only nighttime alerts, or both. We get a nightly report with a breakdown of top alerts, broken down by host (and then by service within that host) and a breakdown by service (and then by host &#8212; useful for seeing if a check is broken or misconfigured, or if there was some more widespread problem).</p>
<p>It gives me a great view into not just what people are woken up for, but the general pulse of our systems overall.  (I&#8217;d be happy to clean it up and share it with the world &#8230; perhaps I should.) And really, at some point, something that sits in a &#8220;warning&#8221; state for an entire weekend &#8212; never actually paging someone &#8212; is just as actionable and just as important as a false positive critical alert: it either needs to be silenced or it&#8217;s indicative of a real problem.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vladimir Vuksan</title>
		<link>http://www.kitchensoap.com/2009/10/05/meanwhile-more-meta-metrics/comment-page-1/#comment-7721</link>
		<dc:creator>Vladimir Vuksan</dc:creator>
		<pubDate>Mon, 05 Oct 2009 20:21:38 +0000</pubDate>
		<guid isPermaLink="false">http://www.kitchensoap.com/?p=292#comment-7721</guid>
		<description>That is certainly a sore point. Lots (most?) places will alert for everything and anything without much consideration for the human component. What&#039;s even worse if there is ever a major outage even more alerts will be added draining the human capital even further.

In the past I have mostly tried to identify those alerts that are truly critical ie. are worth of waking someone up however the idea of graphing them is an interesting one. I don&#039;t believe there are any tools whether open source or commercial that are available for such a purpose. I&#039;d definitely be interested in one.</description>
		<content:encoded><![CDATA[<p>That is certainly a sore point. Lots (most?) places will alert for everything and anything without much consideration for the human component. What&#8217;s even worse if there is ever a major outage even more alerts will be added draining the human capital even further.</p>
<p>In the past I have mostly tried to identify those alerts that are truly critical ie. are worth of waking someone up however the idea of graphing them is an interesting one. I don&#8217;t believe there are any tools whether open source or commercial that are available for such a purpose. I&#8217;d definitely be interested in one.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

