Perl's utf8 and file names on Windows

I've found that I have a lot less problems dealing with Japanese text if I always use utf8; in Perl scripts. See Switching Win32::OLE to UTF-8 for more about this. However, on a Japanese Windows OS this creates problems, because directory and file names may contain kanji or kana. For example the Desktop directory on English-language Windows becomes

C:\\Documents and Settings\\ben\\デスクトップ\\

on a Japanese Windows PC. Although Windows uses Unicode internally, unfortunately as far as Perl is concerned, if I want to manipulate files, I have to send the file name as CP932 (code page 932) encoded text.

However, if I have a use utf8; section at the top of my Perl file, then a non-UTF-8 encoded directory name will cause all kinds of problems with editing.

The way I have solved this problem is, to continue to use utf8; at the top of the page, but to also

use Encode 'encode';

to encode the UTF-8 directory names into CP932 before sending them to functions like open. For example, here is how to make a directory called test on the desktop:

#! perl
use warnings;
use strict;
use utf8;
use Encode 'encode';
my $desktopdir =
    encode ('cp932', 'C:\\\\Documents and Settings\\ben\\デスクトップ\\');
my $newdir = $desktopdir."test";
mkdir $newdir or die "Can't make directory '$newdir': $!\n";

Notice the double backslashes, which are necessary for the cases when there are two backslashes in a row, or an apostrophe after the backslash. Also notice that I use single quotes around the name of $desktopdir in order to avoid problems with the slashes.


Copyright © Ben Bullock 2009-2023. All rights reserved. For comments, questions, and corrections, please email Ben Bullock (benkasminbullock@gmail.com) or use the discussion group at Google Groups. / Privacy / Disclaimer