Problems with in-memory buffers

Perl does lots of things that make life easier, from postfix conditional and looping constructs, to DWIM-infused language design. One of interesting things I discovered when learning to open a file was that Perl can treat a scalar as an in-memory file:

my $string = "many\nlines\nof\ntext";
open my $in, '<', \$string;
while (<$in>) { print }

In one of my test suites, I needed a filehandle to print to, but I didn’t actually need or want anything to be printed to STDOUT, STDERR, or to any actual file on disk – so I opted to open an in-memory buffer. I also needed the filehandle to have the “raw” binmode, so I specified it right in the open statement:

open my $out, '>:raw', \my $mem;

Imagine my surprise when I noticed that the test suite was creating files named SCALAR(0x...) – a string familiar to anyone who has learned about Perl references. I knew intuitively what was happening – the third argument to open was being stringified, and since it is a reference to a scalar, that string will be SCALAR(0x...). Well, that’s not supposed to happen, so let’s unleash the power of Perl and git on Perl, and see if we can figure out what’s going on.

My first experiment was to see if this was version-dependent – and it was! Because perlbrew makes it so easy to install multiple perls, I have a ton of them, so I added a check for files with a name like that to the test suite, and used perlbrew exec prove t/FATAL.t to see where it failed, and where it passed. Perls before 5.14.0 were failing, and 5.14.0 and newer were passing. I checked the perldelta for 5.14.0, and didn’t see this bug mentioned.

Perlbrew gave me a good result – only one switch from failing to passing – but didn’t provide much granularity. so my next stop was to use git bisect to figure out where the switch from buggy to fixed happens.

Git bisect does a binary search through your git history, helping you to quickly find where a bug was introduced or fixed. But I didn’t really want to write a script to compile perl and then test it. Luckily, the Perl source tree now comes with Porting/bisect-runner.pl which does the heavy-lifting. All you have to do is give a snippet of Perl code to test:

bisect.pl --expect-fail -e '1 // 2' # When did this stop being an error?

So, I wrote a little script to run my example open statement, and check if any files were created on-disk that look like SCALAR(0x...). After experimenting with the right options to use with bisect.pl (major thanks to doy!), I got it running the right search, and left to eat dinner.

../perl/Porting/bisect.pl --start=v5.10.0 --expect-fail -e 'unlink glob("SCALAR*"); open my $out, ">:raw", \my $mem; my @files = glob("SCALAR*"); unlink @files; exit scalar @files;'

On returning, I had my answer:

HEAD is now at c0888ac Use PerlIOBase_open for pop, utf8 and bytes layers
good - non-zero exit from ./perl -Ilib -e unlink glob("SCALAR*"); open my $out, ">:raw", \my $mem; my @files = glob("SCALAR*"); unlink @files; exit scalar @files;
ecfd064986aef03b9b7d8b56da83ae37ff435721 is the first bad commit
commit ecfd064986aef03b9b7d8b56da83ae37ff435721
Author: Leon Timmermans <fawaka @gmail.com>
Date:   Thu Jan 20 22:06:38 2011 +0100
 
    Fixes 'raw' layer for perl#80764
 
    Made a ':raw' open do what it advertises to do (first open the file,
    then binmode it), instead of leaving off the top layer.
 
:100644 100644 a89bf8b87a91ec7beefc643bed35d7d627ebc044 3ce31d182a75cb4b46af508f70b603dee890c331 M	perlio.c
bisect run success
That took 2898 seconds
</fawaka>

So, it looks like fixing this bug with the :raw IO layer also fixed a bug with opening filehandles to in-memory buffers. Thanks, Leon!

I had asked Yanick Champoux about this at work, and he volunteered that he was 43% sure it was the IO layer – and he was right! The bug is fixed now, but if you need to use the :raw IO layer with a filehandle to an in-memory buffer on a Perl prior to this fix, just open the filehandle, and apply the binmode afterwards.