What characters match a regular expression?

The following Perl script tells you what characters match a regular expression.

Try it

For example, how many characters match \s or \d?

what-matches.pl

This is the source code for the script above. To use it, substitute your regular expression into the argument of count_match on the final line.

#!/usr/local/bin/perl
use warnings;
use strict;
use Unicode::UCD 'charinfo';
binmode STDOUT, "utf8";

#line 33 "what-matches.cgi"

sub count_match
{
    my ($re)=@_;
    my $overflow;
    # Print a maximum of $max_chars characters.
    my $max_chars = 50;
    my $total_characters = 0;
    # All the Unicode characters we're allowed. Found by trial and
    # error.
    for my $n (0x00 .. 0xD7FF, 0xE000 .. 0xFDCF, 0xFDF0.. 0xF0000)  {
        my $lowbytes = $n % 0x10000;
        if ($lowbytes == 0xFFFF || $lowbytes == 0xFFFE) {
            next;
        }
        if (chr ($n) =~ /$re/) {
            if ($total_characters < $max_chars) {
                my $name = "?";
                my $charinfo = charinfo ($n);
                if ($charinfo) {
                    $name = charinfo ($n)->{name};
                }
                printf "%04X: '%s' %s\n", $n, chr $n, $name;
            } elsif (! $overflow) {
                $overflow = 1;
                print "Printing only first $max_chars.\n";
            }
            $total_characters++;
        }
    }
    print "\n$total_characters characters match.\n";
}
count_match($re);

Note that answers differ slightly depending on Perl version, since the underlying Unicode character database changed between Perl 5.8 and 5.10.

Web links


Copyright © Ben Bullock 2009-2017. All rights reserved. For comments, questions, and corrections, please email Ben Bullock (benkasminbullock@gmail.com) or use the discussion group at Google Groups. / Privacy / Disclaimer