Google has been working for about 2 years on a book about Site Reliability Engineering, the discipline and organization that keeps Google’s large-scale systems running smoothly. “Site Reliability Engineering” was finally published last week. It spans some 500 pages, and offers a rare inside glimpse into how Google actually works. The authors are remarkably open, naming technologies and projects, and explaining how systems work. There may not be source code, but there’s lots here that can be implemented outside Google. That makes it a great read for startups expecting to scale and small-to-medium tech companies that want to up their reliability game.
Posted: Apr 7, 2016