What characters match a regular expression?

The following Perl script tells you what characters match a regular expression.

Try it

For example, how many characters match \s or \d?


This is the source code for the script above. To use it, substitute your regular expression into the argument of count_match on the final line.

use warnings;
use strict;
use Unicode::UCD 'charinfo';
binmode STDOUT, "utf8";

#line 33 "what-matches.cgi"

sub count_match
    my ($re)=@_;
    my $overflow;
    # Print a maximum of $max_chars characters.
    my $max_chars = 50;
    my $total_characters = 0;
    # All the Unicode characters we're allowed. Found by trial and
    # error.
    for my $n (0x00 .. 0xD7FF, 0xE000 .. 0xFDCF, 0xFDF0.. 0xF0000)  {
        my $lowbytes = $n % 0x10000;
        if ($lowbytes == 0xFFFF || $lowbytes == 0xFFFE) {
        if (chr ($n) =~ /$re/) {
            if ($total_characters < $max_chars) {
                my $name = "?";
                my $charinfo = charinfo ($n);
                if ($charinfo) {
                    $name = charinfo ($n)->{name};
                printf "%04X: '%s' %s\n", $n, chr $n, $name;
            } elsif (! $overflow) {
                $overflow = 1;
                print "Printing only first $max_chars.\n";
    print "\n$total_characters characters match.\n";

Note that answers differ slightly depending on Perl version, since the underlying Unicode character database changed between Perl 5.8 and 5.10.

Web links

Copyright © Ben Bullock 2009-2017. All rights reserved. For comments, questions, and corrections, please email Ben Bullock (benkasminbullock@gmail.com) or use the discussion group at Google Groups. / Privacy / Disclaimer