Perl XS tutorial - digest version

This is an abridged version of the XS tutorial which is supplied with Perl.


Example 1: Hello world

The first example prints "Hello world".

Run h2xs -A -n test. This creates a directory test and files Makefile.PL, lib/test.pm, test.xs, and t/test.t. test.xs looks like this:

#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"

#include "ppport.h"

MODULE = test               PACKAGE = test

Edit test.xs to add

void
hello()
CODE:
    printf("Hello, world!\n");

to the end. Run perl Makefile.PL:

$ perl Makefile.PL
Checking if your kit is complete...
Looks good
Writing Makefile for test
$

This creates a file called Makefile. Run the command "make":

$ make
cp lib/test.pm blib/lib/test.pm
perl xsubpp  -typemap typemap  test.xs > test.xsc && mv test.xsc test.c
Please specify prototyping behavior for test.xs (see perlxs manual)
cc -c     test.c
Running Mkbootstrap for test ()
chmod 644 test.bs
rm -f blib/arch/auto/test/test.so
cc  -shared -L/usr/local/lib test.o  -o blib/arch/auto/test/test.so
chmod 755 blib/arch/auto/test/test.so
cp test.bs blib/arch/auto/test/test.bs
chmod 644 blib/arch/auto/test/test.bs
Manifying blib/man3/test.3pm
$

Now we run the extension. Create a file called hello containing

#!/usr/bin/perl
use ExtUtils::testlib;
use test;
test::hello();

(download)

Make hello executable with chmod +x hello, and run it:

$ ./hello
Hello, world!
$

Example 2: Odd or even

This extension returns 1 if a number is even, and 0 if the number is odd.

Add the following to the end of test.xs from example one:

int
is_even(input)
    int input
CODE:
    RETVAL = (input % 2 == 0);
OUTPUT:
    RETVAL

Run "make" again. Create a test script, t/test.t, containing

# 3 is the number of tests.
use Test::More tests => 3;
use test;

is (test::is_even(0), 1);
is (test::is_even(1), 0);
is (test::is_even(2), 1);

Run it by typing make test:

$ make test
PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/test.t .. ok   
All tests successful.
Files=1, Tests=3,  0 wallclock secs ( 0.03 usr  0.02 sys +  0.02 cusr  0.02 csys =  0.09 CPU)
Result: PASS
$

Files and directories

h2xs starts extensions. It creates Makefile.PL, which generates Makefile, and Lib/test.pm and test.xs, which contain the extension. Test.xs is the C part, and Test.pm tells Perl how to load the extension.

Running make creates a directory blib for compiled output. Make test invokes perl such that it finds the extension files in blib. To test an extension, use make test, or run the test file using

perl -I blib/lib -I blib/arch t/test.t

Without this, the test script will fail to run, or, if there is another version of the extension installed, it will use that, instead of the version which was meant to be tested.

Example 3: Rounding numbers

This takes an argument and sets it to its rounded value. To the end of test.xs, add

void
round(arg)
    double  arg
    CODE:
    if (arg > 0.0) {
            arg = floor(arg + 0.5);
    } else if (arg < 0.0) {
            arg = ceil(arg - 0.5);
    } else {
            arg = 0.0;
    }
    OUTPUT:
    arg

Add '-lm' to the line containing 'LIBS' in Makefile.PL:

'LIBS'      => ['-lm'],   # e.g., '-lm'

This adds a link to the C maths library which contains floor and ceil. Change the number of tests in test.t to "8",

use Test::More tests => 8;

and add the following tests:

my $i;
$i = -1.5; test::round($i); is( $i, -2.0 );
$i = -1.1; test::round($i); is( $i, -1.0 );
$i = 0.0; test::round($i);  is( $i,  0.0 );
$i = 0.5; test::round($i);  is( $i,  1.0 );
$i = 1.2; test::round($i);  is( $i,  1.0 );

Run perl Makefile.PL, make, then make test. It should print out that eight tests have passed.

Input and Output Parameters

Parameters of the XSUB are specified after the function's return value and name. The output parameters are listed at the end of the function, after OUTPUT:. RETVAL tells Perl to send this value back as the return value of the XSUB function. In Example 3, the return value was placed in the original variable which was passed in, so it and not RETVAL was listed in the OUTPUT: section.

Xsubpp translates XS into C. Its rules to convert from Perl data types, such as "scalar" or "array", to C data types such as int, or char, are found in a file called a "typemap". This has three parts. The first part maps C types to a name which corresponds to Perl types. The second part contains C code which xsubpp uses for input parameters. The third part contains C code which xsubpp uses for output parameters.

For example, look at a portion of the C file created for the extension, test.c:

XS(XS_test_round); /* prototype to pass -Wmissing-prototypes */
XS(XS_test_round)
{
#ifdef dVAR
    dVAR; dXSARGS;
#else
    dXSARGS;
#endif
    if (items != 1)
       croak_xs_usage(cv,  "arg");
    {
        double  arg = (double)SvNV(ST(0));
#line 30 "test.xs"
    if (arg > 0.0) {
            arg = floor(arg + 0.5);
    } else if (arg < 0.0) {
            arg = ceil(arg - 0.5);
    } else {
            arg = 0.0;
    }
#line 137 "test.c"
        sv_setnv(ST(0), (double)arg);
        SvSETMAGIC(ST(0));
    }
    XSRETURN_EMPTY;
}

(download)

In the typemap file, doubles are of type T_DOUBLE. In the INPUT section of typemap, an argument that is T_DOUBLE is assigned to the variable arg by calling SvNV, then casting its value to double, then assigning that to arg. In the OUTPUT section, arg is passed to sv_setnv to be passed back to the calling subroutine. (ST(0) is discussed in "The argument stack").

XS file structure

The lines before MODULE = in the XS file are C. Xsubpp just copies them. Parts after MODULE = are XSUB functions. Xsubpp translates them to C.

Simplifying XSUBs

In "Example 4: Using a header file" the second part of the XS file contained the following description of an XSUB:

double
foo(a,b,c)
    int             a
    long            b
    const char *    c
    OUTPUT:
    RETVAL

In contrast with "Example 1: Hello world", "Example 2: Odd or even" and "Example 3: Rounding numbers". this description does not contain code for what is done during a call to foo(). Even if a CODE section is added to this XSUB:

double
foo(a,b,c)
    int             a
    long            b
    const char *    c
    CODE:
    RETVAL = foo(a,b,c);
    OUTPUT:
    RETVAL

the result is almost identical generated C code: xsubpp compiler figures out the CODE: section from the first two lines of the description of XSUB. The OUTPUT: section can be removed as well, if a CODE: section is not specified: xsubpp can see that it needs to generate a function call section, and will autogenerate the OUTPUT section too. Thus the XSUB can be

double
foo(a,b,c)
    int             a
    long            b
    const char *    c

This can also be done for

int
is_even(input)
    int     input
    CODE:
    RETVAL = (input % 2 == 0);
    OUTPUT:
    RETVAL

of "Example 2: Odd or even", if a C function int is_even(int input) is supplied. As in "XS file structure", this may be placed in the first part of the .xs file:

int
is_even(int arg)
{
    return (arg % 2 == 0);
}

If this is in the first part of the xs file, before MODULE = , the XS part need only be

int
is_even(input)
    int     input

XSUB arguments

When arguments to routines in the .xs file are specified, three things are passed for each argument listed. The first is the order of that argument relative to the others (first, second, third). The second is the type of argument (int, char*). The third is the calling convention for the argument in the call to the library function.

Suppose two C functions with similar declarations, for example

int string_length (char *s);
int upper_case_char (char *cp);

operate differently on the argument: string_length inspects the characters pointed to by s without changing their values, but upper_case_char manipulates what cp points to. From Perl, these functions are used in a different manner.

Tell xsubpp which is which by replacing the * before the argument by &. An ampersand, &, means that the argument should be passed to a library function by its address. In the example,

int
string_length(s)
    char * s

but

int
upper_case_char(cp)
    char & cp

For example, consider:

int
foo(a,b)
    char & a
    char * b

The first Perl argument to this function is treated as a char and assigned to a, and its address is passed into foo. The second Perl argument is treated as a string pointer and assigned to b. The value of b is passed into the function foo. The call to foo that xsubpp generates looks like this:

foo (& a, b);

The argument stack

In the generated C code, there are references to ST(0), ST(1) and so on. ST is a macro that points to the nth argument on the argument stack. ST(0) is thus the first argument on the stack and therefore the first argument passed to the XSUB, ST(1) is the second argument, and so on.

The list of arguments to the XSUB in the .xs file tells xsubpp which argument corresponds to which of the argument stack (i.e., the first one listed is the first argument, and so on). These must be listed in the same order as the function expects them.

The actual values on the argument stack are pointers to the values passed in. When an argument is listed as being an OUTPUT value, its corresponding value on the stack (i.e., ST(0) if it was the first argument) is changed. Verify this by looking at the C code generated for Example 3. The code for round contains lines that look like this:

double arg = (double) SvNV(ST(0));
/* Round the contents of the variable arg */
sv_setnv (ST(0), (double)arg);

The arg variable is initially set by taking the value from ST(0), then is stored back into ST(0) at the end of the routine.

XSUBs are also allowed to return lists, not just scalars. This must be done by manipulating stack values ST(0), ST(1), etc. See perlxs.

XSUBs are also allowed to avoid automatic conversion of Perl function arguments to C function arguments. See perlxs. Some people prefer manual conversion by inspecting ST(i) even in the cases when automatic conversion will do, arguing that this makes the logic of an XSUB call clearer. Compare with "Simplifying XSUBs".

Example 4: Returning an array

This example illustrates working with the argument stack. The previous examples have all returned only a single value. This example shows an extension which returns an array. This example uses the statfs system call.

Return to the test directory. Add the following to the top of test.xs, after #include "XSUB.h":

#include <sys/vfs.h>

or

#include <sys/param.h>
#include <sys/mount.h>

depending on your operating system (read "man statfs" for the correct details for your version of Unix).

Add to the end:

void
statfs(path)
    char *  path
    INIT:
    int i;
    struct statfs buf;

    PPCODE:
    i = statfs(path, &buf);
    if (i == 0) {
            XPUSHs(sv_2mortal(newSVnv(buf.f_bavail)));
            XPUSHs(sv_2mortal(newSVnv(buf.f_bfree)));
            XPUSHs(sv_2mortal(newSVnv(buf.f_blocks)));
            XPUSHs(sv_2mortal(newSVnv(buf.f_bsize)));
            XPUSHs(sv_2mortal(newSVnv(buf.f_ffree)));
            XPUSHs(sv_2mortal(newSVnv(buf.f_files)));
            XPUSHs(sv_2mortal(newSVnv(buf.f_type)));
    } else {
            XPUSHs(sv_2mortal(newSVnv(errno)));
    }

In test.t, change the number of tests from 9 to 11, and add

@a = test::statfs("/blech");
ok( scalar(@a) == 1 && $a[0] == 2 );
@a = test::statfs("/");
is( scalar(@a), 7 );

This routine returns a different number of arguments depending on whether the call to statfs succeeds. If there is an error, the error number is returned as a single-element array. If the call is successful, then a 7-element array is returned.

INIT: says to place the code following it immediately after the argument stack is decoded. PPCODE: tells xsubpp that the xsub manages the return values put on the argument stack by itself.

To place values to be returned to the caller onto the stack, use the series of macros that begin with XPUSH. There are five different versions, for placing integers, unsigned integers, doubles, strings, and Perl scalars on the stack. In the example, a Perl scalar was placed onto the stack.

The values pushed onto the return stack of the XSUB are "mortal" SVs. They are made "mortal" so that once their values are copied by the calling program, the SV's that held the returned values can be deallocated. If they were not mortal, then they would continue to exist after the XSUB routine returned, but would not be accessible, causing a memory leak.

Example 5: Arrays and hashes

This example takes an array reference as input, and returns a reference to an array of hash references:

my $stats = multi_statfs (['/', '/usr/']);
my $usr_bfree = $stats->[1]->{f_bfree};

It is based on "Example 4: Returning an array". It takes a reference to an array of filenames as input, calls statfs for each file name, and returns a reference to an array of hashes containing the data for each of the filesystems.

In the test directory add the following code to the end of test.xs:

SV *
multi_statfs(paths)
    SV * paths
INIT:
    /* The return value. */
    AV * results;
    /* The number of paths in "paths". */
    I32 numpaths = 0;
    int i, n;
    
    /* Check that paths is a reference, then check that it is an
    array reference, then check that it is non-empty. */

    if ((! SvROK(paths))
    || (SvTYPE(SvRV(paths)) != SVt_PVAV)
    || ((numpaths = av_len((AV *)SvRV(paths))) < 0))
    {
        XSRETURN_UNDEF;
    }

    /* Create the array which holds the return values. */

    results = (AV *) sv_2mortal ((SV *) newAV ());

CODE:
    for (n = 0; n <= numpaths; n++) {
        HV * rh;
        STRLEN l;
        struct statfs buf;

        /* Get the nth value from array "paths". */

        char * fn = SvPV (*av_fetch ((AV *) SvRV (paths), n, 0), l);

        i = statfs (fn, &buf);
        if (i != 0) {
            av_push (results, newSVnv (errno));
            continue;
        }

        /* Create a new hash. */

        rh = (HV *) sv_2mortal ((SV *) newHV ());

        /* Store the numbers in rh, under the given names. */

        hv_store(rh, "f_bavail", 8, newSVnv(buf.f_bavail), 0);
        hv_store(rh, "f_bfree",  7, newSVnv(buf.f_bfree),  0);
        hv_store(rh, "f_blocks", 8, newSVnv(buf.f_blocks), 0);
        hv_store(rh, "f_bsize",  7, newSVnv(buf.f_bsize),  0);
        hv_store(rh, "f_ffree",  7, newSVnv(buf.f_ffree),  0);
        hv_store(rh, "f_files",  7, newSVnv(buf.f_files),  0);
        hv_store(rh, "f_type",   6, newSVnv(buf.f_type),   0);

        av_push(results, newRV((SV *)rh));                       
    }
    RETVAL = newRV((SV *)results);
OUTPUT:
    RETVAL

Add to test.t

$results = test::multi_statfs([ '/', '/blech' ]);
ok( ref $results->[0] );
ok( ! ref $results->[1] );

This function does not use a typemap. Instead, it accepts one SV* (scalar) parameter, and returns an SV*. These scalars are populated within the code. Because it only returns one value, there is no need for a PPCODE: directive, only CODE: and OUTPUT: directives.

When dealing with references, it is important to handle them with caution. The INIT: block first checks that SvROK returns true, which indicates that paths is a valid reference. It then verifies that the object referenced by paths is an array, using SvRV to dereference paths, and SvTYPE to discover its type. As an added test, it checks that the array referenced by paths is non-empty, using av_len, which returns -1 if the array is empty. The XSRETURN_UNDEF macro aborts the XSUB and returns the undefined value whenever all three of these conditions are not met.

We manipulate several arrays in this XSUB. An array is represented internally by a pointer to AV. The functions and macros for manipulating arrays are similar to the functions in Perl: av_len returns the highest index in an AV*, much like $#array; av_fetch fetches a scalar value from an array, given its index; av_push pushes a scalar value onto the array, extending it if necessary.

Specifically, we read pathnames one at a time from the input array, and store the results in an output array (results) in the same order. If statfs fails, the element pushed onto the return array is the value of errno after the failure. If statfs succeeds, the value pushed onto the return array is a reference to a hash containing some of the information in the statfs structure.

As with the return stack, it would be possible (and a small performance win) to pre-extend the return array before pushing data into it, since we know how many elements we will return:

av_extend(results, numpaths);

We are performing only one hash operation in this function, which is storing a new scalar under a key using hv_store. A hash is represented by an HV* pointer. Like arrays, the functions for manipulating hashes from an XSUB mirror the functionality available from Perl. See perlguts and perlapi for details.

To create a reference, use newRV. An AV* or an HV* can be cast to type SV* in this case. This allows taking references to arrays, hashes and scalars with the same function. Conversely, the SvRV function always returns an SV*, which may need to be cast to the appropriate type if it is something other than a scalar (check with SvTYPE).

Example 6: Passing open files

To create a wrapper around fputs,

#define PERLIO_NOT_STDIO 0
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"

#include <stdio.h>

int
fputs(s, stream)
    char *          s
    FILE *          stream

SEE ALSO

perlguts

This documents functions such as SvNV which convert Perl scalars into C doubles.

perlapi (see http://perldoc.perl.org/perlapi.html)

This lists functions such as croak used for error handling. Beware that the version of the file on the http://search.cpan.org website is not formatted correctly.

perlxs

This is the "official" documentation for Perl XS.

perlmod

This is the "official" documentation for Perl modules.

h2xs

This documents h2xs.

NOTES

make

This tutorial assumes that the "make" program that Perl is configured to use is called make. Instead of running "make" in the examples, a substitute may be required. The command perl -V:make gives the name of the substitute program.

AUTHOR

Jeff Okamoto, reviewed and assisted by Dean Roehrich, Ilya Zakharevich, Andreas Koenig, and Tim Bunce. PerlIO material contributed by Lupe Christoph, with some clarification by Nick Ing-Simmons. Changes for h2xs as of Perl 5.8.x by Renee Baecker.

This digest web version (http://www.lemoda.net/xs/perlxstut/) was edited from that found in the Perl distribution by Ben Bullock.


You can download the POD (Plain Old Documentation) format of this article. This may contain one or two bits of formatting which aren't actually POD.


This document is an edited version of part of the Perl distribution and may be copied, modified and redistributed under the licence terms of Perl itself, the GNU General Public Licence or the Perl Artistic licence.

For comments, questions, and corrections, please email Ben Bullock (benkasminbullock@gmail.com). / Privacy / Disclaimer