Book review: Site Reliability Engineering

Google has been working for about 2 years on a book about Site Reliability Engineering, the discipline and organization that keeps Google’s large-scale systems running smoothly. “Site Reliability Engineering” was finally published last week. It spans some 500 pages, and offers a rare inside glimpse into how Google actually works. The authors are remarkably open, naming technologies and projects, and explaining how systems work. There may not be source code, but there’s lots here that can be implemented outside Google. That makes it a great read for startups expecting to scale and small-to-medium tech companies that want to up their reliability game.

Mike will be a Googler

I spent about 3 months interviewing with a number of companies in Canada and the US, and I was lucky enough that list included an interview with Google’s Site Reliability Engineering team. I went down to the Mountain View campus again in December for an on-site interview. Although the process was daunting, I made a good enough impression that they’ve invited me to join the SRE team in Mountain View when I graduate this May as a Systems Engineer.