Google has been working for about 2 years on a book about Site Reliability Engineering, the discipline and organization that keeps Google's large-scale systems running smoothly. "Site Reliability Engineering" was finally published last week. It spans some 500 pages, and offers a rare inside glimpse into how Google actually works. The authors are remarkably open, naming technologies and projects, and explaining how systems work. There may not be source code, but there's lots here that can be implemented outside Google. That makes it a great read for startups expecting to scale and small-to-medium tech companies that want to up their reliability game.
News for the ‘Work’ Category
Book review: Site Reliability Engineering
Diagnosing performance degradation under adverse circumstances
[This post is a few years old and was never published. Recently, I was reminded about memcached slab imbalance, which in turn reminded me of this post.]
At work, we encountered a sudden and precipitous performance regression on one particular page of a legacy application. It's a Perl web application, running under mod_perl, using
ModPerl::RegistryLoader to compile scripts at server startup, and
Apache::DBI to provide persistent database connections.
Our users suddenly began complaining about one particular page being "three times slower than normal." Later examination of the Apache logs showed a 20x(!!) slowdown.
Investigating this performance problem was interesting because we didn't have good access to required data, and our technology choices slowed us down or completely prevented us from collecting it. Although we solved the mystery, the experience had several important lessons.
Mike will be a Googler
I spent about 3 months interviewing with a number of companies in Canada and the US, and I was lucky enough that list included an interview with Google's Site Reliability Engineering team. I went down to the Mountain View campus again in December for an on-site interview. Although the process was daunting, I made a good enough impression that they've invited me to join the SRE team in Mountain View when I graduate this May as a Systems Engineer.
I'll be relocating in June, and beginning work in July. If you're looking for a shared apartment in San Francisco starting in July, let's talk. I'd like to avoid sharing with someone I don't know, so if we talk now, I'll know you by then.
Automating server build-out with Module::Build
At Pythian, we have one application that is composed of several components, the deployment of which needs to conform to our slightly peculiar server setup. Until recently, this required manually deploying each component. I did this a couple weeks ago, and it took me something like 40 hours to figure out and complete. As I went, I started reading up on
Module::Build, trying to figure out how to automate as much as possible. It turns out that this core module gives us a surprisingly powerful tool for customized deployment. First, it will help to understand a few aspects of how our code is deployed. (more…)
On studying programming and programmers
In university, we do a lot of waterfall in courses with project work. It isn't the kind of thing a student would do (to) themselves, so professors feel obligated to give us that experience in class. Research shows that both business and recent graduates wish they'd been taught agile development methodologies in university and college, but course content always lags behind.
At Pythian, we used a mostly-Scrum methodology, with all the benefits and challenges that entails. (more…)
In Tips & tricks from my 4 months at Pythian, I showed how to give a symlink a new target atomically. I wasn't aware of any module to encapsulate that, so I quickly put together
This module is useful because it eliminates the need to know how to do this safely - simply
and you get a drop-in replacement for
CORE::symlink. It creates a temporary symlink (using
File::Temp to get a unique pathname) pointing to your new target, then moves it into place with a
rename call. On POSIX systems, the rename system call guarantees atomicity.
I put it on PrePAN to get some advice. I have no clue what that'll do on any non-POSIX systems that have symlinks (if the OS doesn't do symlinks, I can't help you). Is a rename call universally atomic? If not, how can I detect those platforms, and provide that atomic guarantee some other way?
I didn't get any feedback, so I chose to simply release the module. It's now on CPAN. Enjoy!
Tips & tricks from my 4 months at Pythian
After working with Yanick Champoux on a few little Perl projects here and there, we finally met face-to-face at YAPC::NA last summer. A few months later, when I was looking for a co-op position, I immediately thought of Pythian. (more…)
SSL security in HTTP::Tiny
I was asked to add SSL support to a client library, while also moving from home-grown manual HTTP code to a proper module. HTTP::Tiny was ideal because it is pure-Perl, a core module since 5.14 (so it'll be maintained), and it's just one .pm file, making it easy to ship.
An application server that supported SSL was provided for testing purposes, but the SSL certificate didn't match the hostname - HTTP::Tiny correctly rejected connections. I needed to be able to control the settings sent to the underlying IO::Socket::SSL object used for the encrypted connection so I could turn off security features for testing. As I worked on that, David Golden offered invaluable feedback, which greatly improved the design of the features added to HTTP::Tiny.
My 4 months at DRDC
Earlier this year, I posted a short entry about having accepted a job offer from Defence Research & Development Canada (DRDC). Over the past four months, I've had a great supervisor, a relaxed workplace, and challenging work. As you might imagine, working at a defence research lab is quite different from anything I'd done previously. I knew that was going to be the case, but I was still surprised at how little of my prior knowledge applied to The Real World Of Real Work. But first, what awesome, classified, doomsday devices did I get to work on? Well... (more…)
Heading to YAPC::NA 2011
Last year, I stumbled across presentingperl.org and discovered Yet Another Perl Conference, and the other hackathons, meetups, and workshops that the Perl community organizes all over the world. I immediately wanted to attend, but I wasn't able to arrange to go. This year, I'm lucky to be working at DRDC for the summer, and they've given me time off to attend YAPC::NA 2011 in Asheville, NC. (more…)
Co-op at DRDC
I was late to join the job search process for my first co-op work term, but I was invited for several interviews. One of the most interesting jobs I applied for was a specific project with Defence Research and Development Canada. One of DRDC's task groups is building a fake submarine for training purposes. The sub has mocked systems ranging from a sheet of paper saying "here are some buttons" up to a sophisticated computer simulation of real submarine systems. They were looking for someone to do the simulated sonar system. The project involves everything from soliciting requirements, finding and analyzing off-the-shelf software products (including FLOSS!) to meet those requirements, installing & integrating the system, doing quality assurance, and documenting the entire process.
I was lucky enough to be interviewed for the position, and I received an offer. I've happily accepted, and assuming the background check clears, I'll begin in May.
The location is very convenient for me, and it'll be an exciting and challenging project to work on. I'm sure it'll provide an excellent opportunity to both apply and expand my knowledge over the course of the summer. I'm excited to get started!