Unicode numbers are not numeric in Perl
This demonstrates that numeric Unicode characters are not considered
numeric by Perl, even when they match the regular
expression \d
.
#!perl use warnings; use strict; use utf8; use Scalar::Util 'looks_like_number'; my $count = 1; # Wide ASCII one, Unicode FF11. my $ff11 = '1'; my $warned; if ($ff11 =~ /\d/) { print "ok $count\n"; } else { print "not ok $count\n"; } $count++; # Catch warnings. $SIG{__WARN__} = sub { $warned = "@_"; }; if ($ff11 >= 1) { print "ok $count\n"; } else { print "not ok $count\n"; } $count++; if ($warned) { print "not ok $count - warning '$warned'\n"; } else { print "ok $count - no warnings\n"; } $count++; if (looks_like_number ($ff11)) { print "ok $count\n"; } else { print "not ok $count\n"; } print "1..$count\n";
ok 1 not ok 2 not ok 3 - warning 'Argument "\x{ff11}" isn't numeric in numeric ge (>=) at /usr/home/ben/lemoda/perl/perl-numeric/ff11.pl line 19. ' not ok 4 1..4
Thus, when validating whether numbers may be used in arithmetic, it's
better to use [0-9]
to match digits than \d
.
For example, take Lingua::EN::Numericalize. Line 106
validates numbers using \d
and then does arithmetic on
them. However, this fails if the input string contains Unicode-encoded
characters like the '1' in the above example program.
Copyright © Ben Bullock 2009-2024. All
rights reserved.
For comments, questions, and corrections, please email
Ben Bullock
(benkasminbullock@gmail.com) or use the discussion group at Google Groups.
/
Privacy /
Disclaimer