Unicode numbers are not numeric in Perl

This demonstrates that numeric Unicode characters are not considered numeric by Perl, even when they match the regular expression \d.

#!perl
use warnings;
use strict;
use utf8;
use Scalar::Util 'looks_like_number';
my $count = 1;
# Wide ASCII one, Unicode FF11.
my $ff11 = '1';
my $warned;
if ($ff11 =~ /\d/) {
    print "ok $count\n";
}
else {
    print "not ok $count\n";
}
$count++;
# Catch warnings.
$SIG{__WARN__} = sub { $warned = "@_"; };
if ($ff11 >= 1) {
    print "ok $count\n";
}
else {
    print "not ok $count\n";
}
$count++;
if ($warned) {
    print "not ok $count - warning '$warned'\n";
}
else {
    print "ok $count - no warnings\n";
}
$count++;
if (looks_like_number ($ff11)) {
    print "ok $count\n";
}
else {
    print "not ok $count\n";
}
print "1..$count\n";

(download)

ok 1
not ok 2
not ok 3 - warning 'Argument "\x{ff11}" isn't numeric in numeric ge (>=) at /usr/home/ben/lemoda/perl/perl-numeric/ff11.pl line 19.
'
not ok 4
1..4

Thus, when validating whether numbers may be used in arithmetic, it's better to use [0-9] to match digits than \d.

For example, take Lingua::EN::Numericalize. Line 106 validates numbers using \d and then does arithmetic on them. However, this fails if the input string contains Unicode-encoded characters like the '1' in the above example program.


Copyright © Ben Bullock 2009-2024. All rights reserved. For comments, questions, and corrections, please email Ben Bullock (benkasminbullock@gmail.com) or use the discussion group at Google Groups. / Privacy / Disclaimer