A mod_speling using Perl and CGI

This is an example of using Text::Fuzzy to provide an alternative to mod_speling with a Perl CGI script. If this CGI script is called something like "misspelt-web-page.cgi" and put in the top directory, it can be used to handle "Not Found" errors with a .htaccess file containing the line

ErrorDocument 404 /misspelt-web-page.cgi

The redirection script is as follows:

#!/home/ben/software/install/bin/perl
use warnings;
use strict;
use Text::Fuzzy;

# The directory of files served by the web server.

my $web_root = '/usr/local/www/data';

# If the query is "http://www.example.com/abc/xyz.html", $path_info is
# "abc/xyz.html".

my $path_info = $ENV{REQUEST_URI};

if (! defined $path_info) {
    fail ("No path info");
}

if ($0 =~ /$path_info/) {
    fail ("Don't redirect to self");
}

# This is the list of files under the main page.

my @allfiles = get_all_files ($web_root, '');

# This is our spelling search engine.

my $tf = Text::Fuzzy->new ($path_info);

my $nearest = $tf->nearest (\@allfiles, max => 5);

if (defined $nearest) {
    redirect ($allfiles[$nearest]);
}
else {
    fail ("Nothing like $path_info was found");
}
exit;

# Read all the files under "$root/$dir". This is recursive. The return
# value is an array containing all files found.

sub get_all_files
{
    my ($root, $dir) = @_;
    my @allfiles;
    my $full_dir = "$root/$dir";
    if (! -d $full_dir) {
        fail ("$full_dir is not a directory");
    }
    opendir DIR, $full_dir or fail ("Can't open directory $full_dir: $!");
    my @files = grep !/^\./, readdir DIR;
    closedir DIR or fail ("Can't close $full_dir: $!");
    for my $file (@files) {
        my $dir_file = "$dir/$file";
        my $full_file = "$root/$dir_file";
        if (-d $full_file) {
            push @allfiles, get_all_files ($root, $dir_file);
        }
        else {
            push @allfiles, $dir_file;
        }
    }
    return @allfiles;
}

# Print a "permanent redirect" to the respelt name, then exit.

sub redirect
{
    my ($url) = @_;
    print <<EOF;
Status: 301
Location: $url

EOF
    exit;
}

# Print an error message for the sake of the requester, and print a
# message to the error log, then exit.

sub fail
{
    my ($error) = @_;
    print <<EOF;
Content-Type: text/plain

$error
EOF
    # Add the name of the program and the time to the error message,
    # otherwise the error log will get awfully confusing-looking.
    my $time = scalar gmtime ();
    print STDERR "$0: $time: $error\n";
    exit;
}

(download)

For the sake of this web server, the script is named with a suffix ".pl".

The script works by building a list of all the files in the directory, and then searching through this list using a Text::Fuzzy object for the nearest match. If it finds a match, it prints a redirection header which sends the user's browser to the matching file.


Copyright © Ben Bullock 2009-2014. All rights reserved. For comments, questions, and corrections, please email Ben Bullock (benkasminbullock@gmail.com). / Privacy / Disclaimer