One of my apprehensions in moving to New York from San Francisco was a common concern: why would I move from the ‘epicenter’ of the web to a place where it’s not? There’s been lots written about startup hub cities, and innovative web metro areas, but the fact of the matter is that New York hasn’t historically been a hotbed of web growth and innovation. Not compared to the Bay Area or Seattle, anyway.
I do, of course, think this is changing as of recently. The punch line is that I obviously did take the job, despite my misgivings about not being surrounded by people who are constantly thinking about my industry. One of the reasons I got over not being in the ‘epicenter’ is that Fred Wilson and Albert Wenger did an insanely good job at convincing me it was a good idea.
Another reason is that I think Etsy is basically a Bay Area company that just happens to be in Brooklyn. I mean that as a compliment.
So while I always had some inkling of what ‘epicenter of the web’ means, I was never really sure how that could be measured. Indeed.com has indirectly measured it by the # of job listings. O’Reilly did something similar for the # of startup jobs in 2006.
Number of jobs is interesting, but I thought it might be fun to measure it by locations of headquarters as seen through the lens of monthly unique users. So, I took the Quantcast “Top 100″ sites, found the latitude and longitude of the headquarters of each site via Crunchbase’s API, as well as other bits around the web, and Aaron helped out with the excellent Modest Maps to make this:
Quantcast Top 100 plotted on U.S. Map, radius = monthly uniques
Like I said, this doesn’t change my thoughts about the new job, or what I think ‘epicenter of the web’ means. But, still interesting, dontcha think?
UPDATE: Here’s a link to the raw data: http://spreadsheets.google.com/pub?key=tLwD1C5mghn9U3XJj_yqyjw&output=html
If there’s anything wrong, lemme know.
I guess I’m late on getting to this, but How Complex Systems Fail by Richard Cook is excellent.
Let me start with this: I don’t think I can overstate how right-on this paper is, with respect to the challenges, solutions, observations, and concerns involved with operating a medium to large web infrastructure. I found this via @benjaminblack, and I agree with him 100%: this should be considered required reading for anyone in our industry. I’m not sure if Cook ever thought that his paper would apply to web infrastructure, but I think it can and does. Please take 30 minutes right now and read it.
There are a number of salient points in the paper that I’d like to comment on. Again, this is through the lens of failures of complex systems as it pertains to web operations:
7) Post-accident attribution accident to a ‘root cause’ is fundamentally wrong.
I’m going to guess that this portion may be viewed as controversial in the prevailing webops wisdom, where post-mortems are for sure necessary, but whose content may or may not be effective in preventing similar types of failure. I do value the process of a post-mortem, because I think the human element of understanding complex failures is important and doing whatever you can to put in place safety is good, modulo what is said in section #16 of the paper. I believe that even a rudimentary process of “5 Whys” has value. But at the same time, I also think that there is something in the spirit of this paragraph, which is that there is a danger in standing behind a single underlying cause when there are systemic failures involved. Doing this can lead to the false belief that you’ve got this mode covered, you’ve found the silver bullet that made the whole mountain crumble, and jeez what a relief because that will never bite us again.
14) Change introduces new forms of failure.
I totally agree with this point. However, I often see this as a rallying point for operations teams to say “No!” to change, when instead they should be working alongside development (and product owners) with a goal of reducing the risk of failure associated with each change. I do not believe that ‘release early, release often’ in and of itself can reduce that risk. I believe that the real (and only) way to do this is both technical and cultural. But I’ve spoken about this before.
16) Safety is a characteristic of systems and not of their components
Emphasis on “Safety cannot be purchased or manufactured; it is not a feature that is separate from the other components of the system.” Real safety comes from smart people doing smart things to the entire shebang, not the individual guts.
and I think the point I love the most, with all of my heart:
18) Failure free operations require experience with failure.
Fear is a strong emotion. I believe it can be used as a strong motivator for ensuring safety in the face of constant change, instead of a reason to push back on the very idea of change. Embrace fear of outages and degradation. Use it to guide your architecture, your code, your infrastructure. So lean into it.
There are a lot of great points in the paper, and I could go on, but you get the idea.