Perl programmers are probably all aware of the utf8 pragma, which turns on UTF-8 in your source code. This is actually a stumbling block for new programmers, who might think that utf8 makes your filehandles use UTF-8 by default, or automagically turns incoming data into UTF-8, and ensures outgoing data is all UTF-8 as well. Sadly, that’s not the case.
However, one of the great things about perl5i is that it turns on Unicode. All of it.
UTF-8 source code. yes – but also sets the default IO layers (aka “disciplines” – just to be extra fancy) so your opens are automatically like:
open my $fh, '<:encoding (UTF-8)', 'file-to-read';
It also encodes @ARGV
. These are really useful things to do by default so you don’t have to ever think or worry about them, and Michael Schwern deserves credit for making it happen. I’ve split that code (which is a surprisingly small amount) into a new standalone pragma utf8::all
.
The first release is already up on the CPAN. I’d welcome bug reports, and suggestions for anything else where we can turn on UTF-8 by default (if there is anything more).
I’m also excited about the Perl 5.14.0-RC1 that was released today. It includes, among other goodies, a feature called unicode_strings
which will cause Unicode semantics to be used for all string operations for which it is in scope. Awesome! That’ll be a huge headache removed for people who can require 5.14. I look forward to it!