Book review: Site Reliability Engineering

Google has been working for about 2 years on a book about Site Reliability Engineering, the discipline and organization that keeps Google’s large-scale systems running smoothly. “Site Reliability Engineering” was finally published last week. It spans some 500 pages, and offers a rare inside glimpse into how Google actually works. The authors are remarkably open, naming technologies and projects, and explaining how systems work. There may not be source code, but there’s lots here that can be implemented outside Google. That makes it a great read for startups expecting to scale and small-to-medium tech companies that want to up their reliability game.

Diagnosing performance degradation under adverse circumstances

[This post is a few years old and was never published. Recently, I was reminded about memcached slab imbalance, which in turn reminded me of this post.]

At work, we encountered a sudden and precipitous performance regression on one particular page of a legacy application. It’s a Perl web application, running under mod_perl, using ModPerl::RegistryLoader to compile scripts at server startup, and Apache::DBI to provide persistent database connections.

Our users suddenly began complaining about one particular page being “three times slower than normal.” Later examination of the Apache logs showed a 20x(!!) slowdown.

Investigating this performance problem was interesting because we didn’t have good access to required data, and our technology choices slowed us down or completely prevented us from collecting it. Although we solved the mystery, the experience had several important lessons.

CSRF vulnerability at CloudAtCost.com

CloudAtCost.com provides low-cost virtual machines hosted in Canada. panel.cloudatcost.com is the management interface, where customers can install OS images, access a console, etc.A cross-site request forgery (CSRF) vulnerability was discovered in this web application. If a customer could be tricked into visiting a crafted URL while logged in, an attacker could change the victim’s password, gaining access to the management interface. In turn, this grants root access on all the victim’s VMs, the ability to wipe or reinstall VMs, and potentially allows the attacker to spend the victim’s money on CloudAtCost products and services.

Legal issues in computer security research

This Thursday, I gave a talk at AtlSecCon 2014. The weather threw a wrench in the organizers' plans, but they managed to pull off a solid conference. Unfortunately, the talks weren’t recorded this year. The slides are posted on SpeakerDeck, and are embedded below the fold.

I also reprised this talk at NSLUG, and recorded audio, now posted on SoundCloud, and also embedded below the fold.

Finally: late last year, I wrote 3 posts exploring Canada’s computer crime laws (1, 2, 3) which were initial versions of work that eventually became two papers I submitted this semester for a directed studies course. If you were interested in those posts, I’ve embedded the final PDFs below. The talk is a condensed version of that work.

Recovering from Heartbleed

Heartbleed is a critical vulnerability in OpenSSL revealed yesterday. I’m not sure it could be more serious: it allows an attacker to connect to your server and use the TLS heartbeat extension to obtain 64k of server memory (and do it again to get another 64k and again and…) – while leaving no traces in logs. That server memory might include primary key material (private keys), secondary key material (usernames and passwords), and collateral (memory addresses, canaries used to detect overflow, etc)

Mike will be a Googler

I spent about 3 months interviewing with a number of companies in Canada and the US, and I was lucky enough that list included an interview with Google’s Site Reliability Engineering team. I went down to the Mountain View campus again in December for an on-site interview. Although the process was daunting, I made a good enough impression that they’ve invited me to join the SRE team in Mountain View when I graduate this May as a Systems Engineer.

Upgrading encrypted Android devices

If you encrypt your Android device, the standard over-the-air (OTA) upgrades don’t work, because /sdcard can’t be mounted in recovery.

Exploring Canada's computer crime laws: Part 3

Since the exceptions in copyright law for encryption and security research don’t apply if you’re doing anything criminal, I next looked at the Criminal Code [PDF].

Exploring Canada's computer crime laws: Part 2

Since the exceptions in copyright law for encryption and security research don’t apply if you’re doing anything criminal, I next looked at the Criminal Code [PDF].

Exploring Canada's computer crime laws: Part 1

As someone with an interest in technology, security, and the legal issues surrounding them, I often watch relevant legal cases with interest. Typically, those cases come from the United States. The CFAA has been in the news frequently of late, and not always in a good light. I was pleased to see Zoe Lofgren’s proposed changes, which try to make the law less draconian.

This is typical for Canada – we often see more about American news on topics like this than Canadian. I realized that I really didn’t know what the law in Canada said about so-called computer crimes, although I’ve often wondered. A while back, I took an afternoon to do some reading. I was not happy when that afternoon ended. This is part one of a three-part series on what I found.

How to run a question period

Many different kinds of events involve a presenter giving a speech, and often taking questions. Unfortunately, question periods are often a problem – for both the presenter and the audience. Here are some thoughts on making it better.

Validating SSL certificates for IRC bouncers

IRC bouncers are sort of like a proxy. Your bouncer stays online, connected to IRC, all the time, and then you connect to the bouncer using a normal IRC client. I connect to my bouncer with an SSL-encrypted connection, but I hadn’t been validating the certificate until now. Validating the SSL certificate is critical for thwarting man-in-the-middle (MITM) attacks.

In a MITM attack, the victim connects to the attacker, thinking it is the service they want to talk to (the IRC bouncer in this case). The attacker then forwards the connection to the service. Both connections might use SSL, but in the middle, the attacker can see the plaintext. They can simply eavesdrop, or modify the data flowing in both directions. SSL is supposed to prevent that, but if you don’t validate the certificate, then you don’t know who you’re talking to. I want to know I’m really talking to my IRC bouncer, so let’s figure out how to validate that certificate.

Introducing Hack::Natas

Last Monday, I presented my solutions to the Natas server-side security war games at NSLUG. Afterwards, I spent some time to clean up my code, and I’ve now published it to CPAN as Hack::Natas, which comes with modules and scripts to solve level 15 and 16 in an automated way, plus walkthroughs for all the levels up to 17 written in Markdown (those are almost the same as my blog posts, so you’re not missing out by looking at only one or the other).

Presenting my Natas solutions at NSLUG

Last Monday, I presented my solutions to the Natas server-side security war games at my local linux users' group.

I recorded my talk, but it didn’t turn out well. I was using Google Hangouts for the first time, and I forgot that it only records the windows you tell it to. In the video, embedded below the fold, there’s a lot of talking about windows that were being projected, but which didn’t get recorded. Still, the audio counts for something, and you can see what I’m talking about much of the time.

SSL configuration on nginx

This SSL configuration for nginx achieves an A on the SSL labs tool. It’s what this server currently uses.

Server-side security war games: Part 16

This is the last level. We’re challenged with an improved version of level 9 – they’ve added additional “sanitation” to keep us out.

    if(preg_match('/[;|&`\'"]/',$key)) {
        print "Input contains an illegal character!";
    } else {
        passthru("grep -i \"$key\" dictionary.txt");
    }

Server-side security war games: Part 15

We’re nearly at the end! This is the 2nd-last level.

We know there is a users table, with columns “username” and “password”. This time, the code just checks that the username exists. There’s no way to print out the data we want. Instead, we’ll have to do something cleverer.

Server-side security war games: Part 14

In level 14, we see a more traditional username & password form. Let’s check the source code to see if there are holes we can slip through.

Server-side security war games: Part 13

This is level 13. Looks like they claim to only accept image files, in order to close the flaw we used previously. I bet we can get around that restriction just like we did when they disallowed certain characters in the search term. Let’s examine the code.

Here’s the new part of the code:

    if (! exif_imagetype($_FILES['uploadedfile']['tmp_name'])) {
        echo "File is not an image";
    }

Server-side security war games: Part 12

In level 12, we’re given a file upload form. Let’s take a look at the code that processes input.