Perl and XS: Example 1: Geometry

Here is a C function:

#include "hypotenuse.h"
#include <math.h>

double hypotenuse (double x, double y)
{
    return sqrt (x*x + y*y);
}

(download)

and its prototype

double hypotenuse (double x, double y);

(download)

We want to call Geometry::hypotenuse() from Perl to invoke the above C code.

Make a directory Geometry and a file Geometry/Geometry.xs as follows:

#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"

#include "ppport.h"

#include "hypotenuse.h"

MODULE = Geometry               PACKAGE = Geometry              

PROTOTYPES: ENABLE

double
hypotenuse(x, y)
        double  x
        double  y

(download)

The first three #includes give access to the Perl C API. The fourth #include includes ppport.h, which ensures backward compatibility. See Devel::PPPort. The next #include gives access to hypotenuse.h.

MODULE and PACKAGE are XS directives. They specify the module and package for our xsubs. A module is a file containing Perl code, and a package is a Perl namespace.

XSLoader needs to know the file that contains the library, and the Perl namespace in which to install the xsubs. The MODULE directive tells it the file, and the PACKAGE directive tells it the namespace.

The PACKAGE directive names a Perl package, like Geometry or Align::NW. xsubpp then generates code to install xsubs in that package.

An .xs file may contain multiple MODULE and PACKAGE directives. MODULE and PACKAGE directives should always appear together, as shown above. All MODULE directives in an .xs file should name the same module. PACKAGE directives can name different packages as necessary to place different xsubs into different Perl packages, analogous to the use of package statements in ordinary Perl code.

The PROTOTYPES directive tells xsubpp whether or not to install our xsubs with prototypes. The PROTOTYPES directive goes below the MODULE directive.

XS Routines

After the PROTOTYPES directive comes an XS routine. It specifies the return type (double) of the target routine, the name (hypotenuse) of the target routine, and the name and type of each parameter to the target routine. The newlines are significant, but the indentation is not.

The name of the XS routine is hypotenuse. Xsubpp derives the name of the Perl routine from the name of the XS routine. In this example, xsubpp also determines the name of the target routine from the name of the XS routine. Later on, we'll see examples where the target routine has a different name than the XS routine.

The Perl part of the module lives in lib/Geometry.pm. This needs to contain instructions to load the XS module. Create directory lib, and in it make the module file Geometry.pm

package Geometry;
our $VERSION = '0.01';
require XSLoader;
XSLoader::load('Geometry', $VERSION);
1;

(download)

For building the module, make a file Makefile.PL

use ExtUtils::MakeMaker;

WriteMakefile (
    NAME => 'Geometry',
    VERSION => '0.01',
    OBJECT => 'Geometry.o hypotenuse.o',
);

(download)

Copy the C files into the directory:

$ cp ../hypotenuse.c .
$ cp ../hypotenuse.h .

Create ppport.h by running

perl -MDevel::PPPort -e 'Devel::PPPort::WriteFile ()'

Now we can build the module:

$ perl Makefile.PL
$ make

perl Makefile.PL writes Makefile. Make runs Makefile, which tells xsubpp to translate Geometry.xs to Geometry.c, then tells the C compiler to compile Geometry.c to Geometry.o, and then tells the linker to link Geometry.o into a link library.

Here is Geometry.c, edited a bit for clarity.

#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"

#include "hypotenuse.h"

XS(XS_Geometry_hypotenuse)
{
    dXSARGS;
    if (items != 2)
        croak("Usage: Geometry::hypotenuse(x, y)");
    {
        double  x = (double)SvNV(ST(0));
        double  y = (double)SvNV(ST(1));
        double  RETVAL;

        RETVAL = hypotenuse(x, y);
        ST(0) = sv_newmortal();
        sv_setnv(ST(0), (double)RETVAL);
    }
    XSRETURN(1);
}

XS(boot_Geometry)
{
    dXSARGS;
    char* file = __FILE__;

    XS_VERSION_BOOTCHECK ;

    newXSproto("Geometry::hypotenuse", XS_Geometry_hypotenuse, file, "$$");
    XSRETURN_YES;
}

Geometry.c is an ordinary C source file, suitable for compilation. It looks strange because it is written with XS macros. Let's decode the macros and see how it works.

The #includes are passed through unchanged from the .xs file. The C compiler will need them.

XS_Geometry_hypotenuse is the actual xsub that is generated by xsubpp. The xsub name is pasted together from

the token XS
the name given in the PACKAGE directive
the name of the XS routine

The XS_EUPXS() macro declares XS_Geometry_hypotenuse with the return type and parameters that Perl expects an xsub to have. These are not the parameters to hypotenuse(); we will get those from the Perl stack.

dXSARGS is another XS macro; it declares some local variables that the xsub needs.

One of the locals declared by dXSARGS is items; this gives the number of arguments that were passed to the xsub on the Perl stack. As declared, hypotenuse() requires 2 arguments; the xsub emits a usage message if hypotenuse() is called from Perl with the wrong number of arguments.

Next comes the code that extracts arguments from the Perl stack

double  x = (double)SvNV(ST(0));
double  y = (double)SvNV(ST(1));

ST() is an XS macro that accesses an argument on the Perl stack: ST(0) is the first argument, ST(1) is the second, and so on.

Perl passes parameters by reference, so the things on the stack are pointers to the underlying scalars. SvNV takes a pointer to a scalar and returns the value of that scalar as a floating point number. xsubpp adds a (double) typecast to quiet the C compiler, and assigns that value to a local variable: x for ST(0) and y for ST(1).

xsubpp also declares a local variable to hold the return value of the subroutine.

double  RETVAL;

This variable is always named RETVAL, but it is declared with whatever type the subroutine returns.

With x, y, and RETVAL set up, xsubpp can generate a call to the target routine. xsubpp emits the name of the XS routine as the name of the target routine.

RETVAL = hypotenuse(x, y);

This is a C subroutine call. The next two lines return the value to Perl.

ST(0) = sv_newmortal();
sv_setnv(ST(0), (double)RETVAL);

Return values go on the Perl stack, starting at ST(0). sv_newmortal creates a new mortal scalar value. Like any scalar, it has an initial value of undef (Perl's "undefined value"). sv_setnv sets the value of the scalar to the value that was returned from hypotenuse.

Finally, the XSRETURN(1) macro tells the interpreter how many values we are returning on the Perl stack: in this case, one.

boot

boot_Geometry is the subroutine that DynaLoader calls to install the xsubs in the Geometry module. The subroutine name is pasted together from

the token boot
the name given in the MODULE directive

To install an xsub, boot_Geometry calls

newXSproto("Geometry::hypotenuse", XS_Geometry_hypotenuse, file, "$$");

newXSproto is an entry point in the Perl C API. Its arguments are

the name of a Perl subroutine
a pointer to a C subroutine
the name of a C source file
a Perl subroutine prototype

newXSproto installs the C subroutine XS_Geometry_hypotenuse as an xsub for the Perl routine Geometry::hypotenuse. It supplies a prototype, because we specified PROTOTYPES: ENABLE in the .xs file. The source file name is provided so that Perl can report it in error messages.

The name of the Perl routine is constructed from

the name given in the PACKAGE directive
the name of the XS routine

xsubpp only generates one boot routine per module. The boot routine makes one call to newXSproto for each xsub in the module.

Test

To test our work, create a file Geometry/t/Geometry.t

use Geometry;
use Test::More tests => 1;
is (Geometry::hypotenuse (3, 4), 5);

(download)

Then do

$ perl Makefile.PL
$ make test

The output will look something like this:

PERL_DL_NONLAZY=1 /home/ben/software/install/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/Geometry.t .. ok   
All tests successful.
Files=1, Tests=1,  0 wallclock secs ( 0.02 usr  0.02 sys +  0.02 cusr  0.02 csys =  0.08 CPU)
Result: PASS

hypotenuse() has a simple signature; given that signature, xsubpp can generate code to call it. In more complex cases, we have to write some of the code ourselves. XS provides directives that allow us to supply C code directly, instead of relying on xsubpp. In the examples below, we'll use these to take over progressively more control from xsubpp.

Here is another target routine, in a file called r2p.c:

#include "r2p.h"
#include <math.h>

double r2p(double x, double y, double *theta)
{
    *theta = atan2(y, x);
    return sqrt(x*x + y*y);
}

(download)

and its prototype in r2p.h

double r2p(double x, double y, double *theta);

(download)

r2p converts rectangular to polar coordinates. It returns two values: a magnitude and an angle. The magnitude is the return value of the subroutine; the angle is returned in a third parameter, passed by address. If we write the XS routine as

double
r2p(x, y, theta)
        double  x
        double  y
        double  theta

then xsubpp will treat theta as an input parameter. It will initialize it from the Perl stack, and won't return a value in it. Instead, we write the XS routine as

double
r2p(x, y, theta)
        double  x
        double  y
        double  theta = NO_INIT
        CODE:
                RETVAL = r2p(x, y, &theta);
        OUTPUT:
        RETVAL
        theta

The NO_INIT directive suppresses initialization from the Perl stack. The CODE directive tells xsubpp that we will supply C code to call the target routine. xsubpp still declares RETVAL for us, but we have to assign the return value to it. The call to r2p is

RETVAL = r2p(x, y, &theta);

This is not an XS directive; it is a C statement, and will be passed through to the C compiler. Therefore, it ends with a semicolon.

The OUTPUT directive lists values that are to be copied back to Perl scalars. The order in which we list them doesn't matter; xsubpp knows where each value goes. We need to return both RETVAL and theta.

Here is the xsub that xsubpp generates for this XS routine.

XS(XS_Geometry_r2p)
{
    dXSARGS;
    if (items != 3)
        croak("Usage: Geometry::r2p(x, y, theta)");
    {
        double  x = (double)SvNV(ST(0));
        double  y = (double)SvNV(ST(1));
        double  theta;
        double  RETVAL;
                RETVAL = r2p(x, y, &theta);
        sv_setnv(ST(2), (double)theta);
        SvSETMAGIC(ST(2));
        ST(0) = sv_newmortal();
        sv_setnv(ST(0), (double)RETVAL);
    }
    XSRETURN(1);
}

It looks very much like the xsub for hypotenuse. xsubpp declares theta for us, so that we can pass its address to r2p. It also generates these lines to return theta to Perl

sv_setnv(ST(2), (double)theta);
SvSETMAGIC(ST(2));

It knows to assign theta to ST(2), because we declared theta as the 3rd parameter to r2p. SvSETMAGIC ensures that the scalar at ST(2) will be created, if necessary. It must be created, for example, if it is a non-existent array or hash value.

Test

We can add r2p to the Geometry module. Copy r2p.c and r2p.h into the module directory and add r2p.o to the OBJECT list in Makefile.PL. Add an

#include "r2p.h"

line and the XS code shown above to Geometry.xs. Add

my $theta;
my $r = Geometry::r2p(3, 4, $theta);
print "$r, $theta\n";

to Geometry.t. Now do

$ perl Makefile.pl
$ make
$ make test

The output should be

1..1
ok 1
5
5, 0.927295218001612

`r2p_list`

In the examples above, the Perl routine and the target routine have essentially the same name and signature. However, this isn't necessary. For example, in Perl, it would be more natural to call a routine like r2p as

($r, $theta) = r2p_list($x, $y);

We can obtain this calling sequence with this XS routine

void
r2p_list(x, y)
        double  x
        double  y
        PREINIT:
        double  r;
        double  theta;
        PPCODE:
                r = r2p(x, y, &theta);
        EXTEND(SP, 2);
        PUSHs(sv_2mortal(newSVnv(r    )));
        PUSHs(sv_2mortal(newSVnv(theta)));

There are a few differences between this XS routine and the one that we wrote above for r2p.

The name of the XS routine doesn't match the name of the target routine. xsubpp doesn't need the name of the target routine, because we are supplying the code to call the target routine. xsubpp still uses the name of the XS routine to derive the name of the Perl routine.

The return type of r2p_list is void. This doesn't mean that r2p_list doesn't return anything. Rather, it tells xsubpp that we will supply the code to return values to Perl. Therefore, xsubpp doesn't declare RETVAL for us.

The PREINIT directive gives us a place to declare C variables. Without it, xsubpp might emit executable C code before our variable declarations, which is a syntax error in C. We declare two C variables: r and theta.

The PPCODE directive is similar to the CODE directive. It tells xsubpp that we will supply both the C code to call r2p and the PP code to return values to Perl. PP code is Perl Pseudocode; it is the internal language that the Perl interpreter executes.

The C code to call r2p is

r = r2p(x, y, &theta);

and the PP code to return values to Perl is

EXTEND(SP, 2);
PUSHs(sv_2mortal(newSVnv(r    )));
PUSHs(sv_2mortal(newSVnv(theta)));

The EXTEND macro allocates space on the stack for 2 scalars, and the PUSHs macros push the scalars onto the stack. The PP macros are passed through to the C compiler, so they end with semicolons, like any other line of C code.

The xsub that xsubpp generates is

XS(XS_Geometry_r2p_list)
{
    dXSARGS;
    if (items != 2)
        croak("Usage: Geometry::r2p_list(x, y)");
    SP -= items;
    {
        double  x = (double)SvNV(ST(0));
        double  y = (double)SvNV(ST(1));
        double  r;
        double  theta;
                r = r2p(x, y, &theta);
        EXTEND(SP, 2);
        PUSHs(sv_2mortal(newSVnv(r    )));
        PUSHs(sv_2mortal(newSVnv(theta)));
        PUTBACK;
        return;
    }
}

xsubpp emits code to extract our arguments from the Perl stack, as before. It passes our C variable declarations and our subroutine call through unchanged. It also passes our PP code through.

The biggest difference between XS_Geometry_r2p and XS_Geometry_r2p_list is the stack management. XS_Geometry_r2p uses an XSRETURN(1) macro call to return one value on the stack. XS_Geometry_r2p_list lowers SP by the number of input parameters, and then issues a PUTBACK macro before returning.

I don't actually understand what any of the stack macros do. I wrote the glue routines shown above by following the examples in perlxs. The macros are defined in /usr/local/lib/perl5/<EM>version</EM>/<EM>architecture</EM>/CORE/*.h, but when I tried reading them, I quickly got lost in a maze of #defines, #ifdefs, typedefs, and internal Perl data structures.

Lacking a principled understanding of Perl stack management, you can't actually write PP code: all you can do is follow working examples, as I have. The examples in perlxs appear to be adequate for most xsubs.

`r2p_open`

We saw above that the target routine needn't have the same calling sequence as the Perl routine. In fact, we don't need a target routine at all. Once we have a CODE or a PPCODE directive in our XS code, we can put any C code in the XS routine.

In r2p_open, we dispense with the r2p routine, and compute r and theta in open code.

void
r2p_open(x, y)
        double  x
        double  y
        PREINIT:
        double  r;
        double  theta;
        PPCODE:
                r     = sqrt(x*x + y*y);
                theta = atan2(y, x);
        EXTEND(SP, 2);
        PUSHs(sv_2mortal(newSVnv(r    )));
        PUSHs(sv_2mortal(newSVnv(theta)));

Here is the xsub that xsubpp emits. It looks just like the xsub for r2p_list, except for the lines that compute r and theta.

XS(XS_Geometry_r2p_open)
{
    dXSARGS;
    if (items != 2)
        croak("Usage: Geometry::r2p_open(x, y)");
    SP -= items;
    {
        double  x = (double)SvNV(ST(0));
        double  y = (double)SvNV(ST(1));
        double  r;
        double  theta;
                r     = sqrt(x*x + y*y);
                theta = atan2(y, x);
        EXTEND(SP, 2);
        PUSHs(sv_2mortal(newSVnv(r    )));
        PUSHs(sv_2mortal(newSVnv(theta)));
        PUTBACK;
        return;
    }
}

Add these lines to Geometry/test.pl to test our new xsubs.

($r, $theta) = Geometry::r2p_list(3, 4);
print "$r, $theta\n";

($r, $theta) = Geometry::r2p_open(3, 4);
print "$r, $theta\n";

When we run

.../development>make test

we get

1..1
ok 1
5
5, 0.927295218001612
5, 0.927295218001612
5, 0.927295218001612

Notes

sv_newmortal: The word mortal refers to an optimization in the current implementation of Perl. All data objects in Perl are garbage collected. In most cases, this is done by reference counting. If an object will only exist for a short time—for example, on the stack—maintaining a reference count can impose significant overhead. To avoid this, such objects may be created as mortal. Mortal objects don't have a reference count, but are unconditionally deleted when they are no longer needed—typically at the end of the statement in which they are created. The difficulty of determining when a mortal is no longer needed is a source of continuing maintenance problems in the Perl interpreter.

Notes on these adapted articles

These pages are an adaptation of articles written in 2000 by Steven W. McDougall. My goal in modifying these articles is to simplify and update them. I hope you find these adapted versions of the articles useful. You can find the original articles at the link at the bottom of this page. The major changes in this update are:

h2xs is not used;
XSLoader is used in place of DynaLoader;
It is assumed that the reader understands the basic concepts of C and Perl programming.

This adaptation is a work in progress and many of the links on these pages may not work.

XS Mechanics by Steven W. McDougall is licensed under a Creative Commons Attribution 3.0 Unported License.

For comments, questions, and corrections, please email Ben Bullock (benkasminbullock@gmail.com). / Privacy / Disclaimer