Perl's utf8 and file names on Windows
I've found that I have a lot less problems dealing with Japanese text
if I always use utf8;
in Perl scripts. See
Switching Win32::OLE to UTF-8 for more about this. However, on a Japanese
Windows OS this creates problems, because directory and file names may
contain kanji or kana. For example the Desktop
directory
on English-language Windows becomes
C:\\Documents and Settings\\ben\\デスクトップ\\
on a Japanese Windows PC. Although Windows uses Unicode internally, unfortunately as far as Perl is concerned, if I want to manipulate files, I have to send the file name as CP932 (code page 932) encoded text.
However, if I have a use utf8;
section at the top of my
Perl file, then a non-UTF-8 encoded directory name will cause all
kinds of problems with editing.
The way I have solved this problem is, to continue to use
utf8;
at the top of the page, but to also
use Encode 'encode';
to encode the UTF-8 directory names into CP932 before sending them to
functions like open
. For example, here is how to make a
directory called test
on the desktop:
#! perl use warnings; use strict; use utf8; use Encode 'encode'; my $desktopdir = encode ('cp932', 'C:\\\\Documents and Settings\\ben\\デスクトップ\\'); my $newdir = $desktopdir."test"; mkdir $newdir or die "Can't make directory '$newdir': $!\n";
Notice the double backslashes, which are necessary for the cases when
there are two backslashes in a row, or an apostrophe after the
backslash. Also notice that I use single quotes around the name
of $desktopdir
in order to avoid problems with the
slashes.