Perl and XS: Concepts

- Introduction
- Concepts
- Example 1: Geometry
- The typemap
- Example 2: Math::Ackermann
- Example 3: Set::Bit
Concepts
To call C from Perl, control and data pass from Perl to C.
Data representation
C Data representation
C generally uses the native formats of the processor for data. A
character is one byte, and an integer is a 32-bit number. Complex
data types are aggregates of simple data types. For example, int
x[2];
is eight bytes in memory. Four bytes
for x[0]
, followed by four for x[1]
.
struct S { int a; char b[4]; }
is four bytes for a
, followed by four bytes for b
.
Perl Data representation
Inside Perl, data objects are C structures. For example, a scalar looks like
typedef enum { IOK = 0x01, /* has valid integer value */ POK = 0x02, /* has valid string value */ } Flags; struct Scalar { int refcnt; /* how many references to us */ Flags flags; /* what we are */ char *pv; /* pointer to malloc'd string */ int cur; /* length of pv as a C string */ int len; /* allocated size of pv */ int iv; /* integer value */ };
The Scalar
struct allows Perl to manage the type
information for each scalar. For
example, when Perl executes
my $x = 42;
it allocates a Scalar
struct, sets
refcnt
to 1, sets iv
to 42, sets
the IOK
flag, and clears the POK
flag.
If we later write
print "$x";
the interpreter allocates space for pv
,
calls sprintf(pv, "%d", iv)
to convert iv
to
a string, and sets the POK
flag.
When a reference to a scalar is created, the interpreter
increments refcnt
, and when a reference to the scalar
goes away, the interpreter decrements refcnt
. When the
last reference to the scalar (including $x
) goes
away, refcnt
reduces to zero, and the interpreter frees
the Scalar
.
In Perl, unlike C, the programmer does not need to specify the data's type. Perl knows the type of data and converts as necessary. The programmer does not need to manage storage allocation. Perl knows the size and location of data, and allocates and frees as necessary.
Program Execution
Running a C program involves two steps. First the compiler translates the source code into machine code. Then the CPU executes the machine code. Running a Perl program divides into two similar steps. First the interpreter translates the source code into a syntax tree. Then the interpreter executes the syntax tree.
The Syntax Tree
The Perl interpreter translates Perl source code into a "syntax
tree". The nodes of the tree are operations such as +
and =
, called "opcodes". The children of the nodes
represent its operands, such as numbers to be added. After the
interpreter builds the syntax tree, it executes the program by
"walking" the nodes of the tree in "postfix" order. "Postfix" means
walking the children of a node before the node itself.
Walking a node typically yields a value, so we also speak of "evaluating" a node. The interpreter keeps these values on a stack. To evaluate a node, the interpreter takes its operands off the stack, carries out the operation, and puts the result back onto the stack. Suppose we have the Perl statement
$x = $y + 3;
This is parsed into a syntax tree
= / \ $x + / \ $y 3
The nodes are evaluated in the order
Step | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|
Node | $x | $y | 3 | + | = |
Stack |
\$x |
42 \$x |
3 42 \$x |
45 \$x |
Here is what happens in each step.
- Since the
=
node will assign to$x
, the interpreter pushes a reference to$x
, not its value. - The value of
$y
turns out to be 42. 3
evaluates to 3.- The
+
node pops the values 3 and 42, adds them, and pushes the sum onto the stack. - The
=
pops the value 45 and the reference to$x
, and carries out the assignment. Now the stack is empty.
Subroutine calls
Perl
The Perl interpreter creates a separate syntax tree for each subroutine in the program. Each syntax tree is managed by a code reference. A code reference is the thing that you get when you write
my $coderef = sub { ... }
or
sub foo { ... } my $coderef = \&foo;
in Perl. Internally, a code reference is represented by a C struct. This code reference has a pointer to the root of the syntax tree for the subroutine called the root pointer.
A subroutine call in the program source is represented in the syntax tree by a fragment that looks like this:
entersub | +----+--...---+ | | | arg1 arg2 ... argN
The entersub
opcode transfers control to the called
subroutine. Its children evaluate the arguments of the call.
To execute a subroutine call, the interpreter first walks each child
node, and pushes the result onto the Perl stack. When all the
arguments are on the stack, the interpreter walks
the entersub
node.
An entersub
opcode holds a pointer to a code reference. When
the interpreter walks an entersub
, it follows this pointer to
the code reference, and then follows the root pointer in the code
reference to the syntax tree for the subroutine. Then it executes the
subroutine.
The things on the Perl stack are not the C structs that represent the arguments of the subroutine, but pointers to the structs. In other words, Perl passes parameters by reference, unlike C.
If the subroutine returns any values, it pushes pointers to them onto the stack, in the same locations where its parameters were. After the subroutine returns, the caller retrieves the return values from the stack.
eXternal Subroutines
One of the other things that a code reference has is a C function pointer, a field that contains the address of the entry point of a compiled C subroutine. We'll call this the xsub pointer, and the C subroutine which it points to the xsub.
When the interpreter executes entersub
, it first checks
the xsub pointer in the code reference. If the xsub pointer is null,
it follows the root pointer to the syntax tree for the subroutine and
walks it.
If the xsub pointer is not null, the interpreter ignores the root pointer. Instead, it gets the address of the xsub from the xsub pointer, and calls the xsub, and control passes from Perl to C.
Loading, Linking, and Installation
For a C subroutine to become an xsub, the subroutine has to be loaded into memory, and the interpreter has to set the xsub pointer in a code reference to the entry point of the subroutine.
The xsub pointer is set as follows. The Perl C API includes a routine
newXS (char *name, void (*fp)())
Given the name of a Perl subroutine in name
, and the
address of the entry point of a C subroutine
in fp
, newXS
installs fp
as the
xsub pointer in the code reference for name
. Once this
happens, Perl code that calls name()
will invoke the C
subroutine.
Name
can be in any package. To install a subroutine
called new()
in the package Align::NW
, we
pass the string "Align::NW::new"
for name
.
There are two ways to link C subroutines in a library to an executable, static and dynamic linking. In static linking, the Perl interpreter is linked with the library when it is compiled, creating a modified Perl executable including the C subroutines. In dynamic linking, a Perl program can "load" a library while running and look up the subroutine entry point in the library's symbol table.
Dynamic linking is done by a Perl module called XSLoader. In the XS module,
package My::Module; our $VERSION=0.01; use XSLoader; XSLoader::load 'My::Module', $VERSION;
When the module loads, it calls XSLoader::load
. This locates
the library, loads it, finds the entry points, and calls newXS
.
Parameter Passing
The Perl interpreter puts things on the Perl stack, but C expects to find things on the processor stack. So the xsub has to convert between Perl and C data representations. Typically, the xsub uses facilities in the Perl C API to get parameters from the Perl stack and convert them to C data values. To return a value, the xsub creates a Perl data object and leaves a pointer to it on the Perl stack.
XS is a macro language which allows us to declare C routines, and
specify how Perl data types correspond to C data
types. Xsubpp
reads XS code and outputs C.

- Introduction
- Concepts
- Example 1: Geometry
- The typemap
- Example 2: Math::Ackermann
- Example 3: Set::Bit
Notes on these adapted articles
These pages are an adaptation of articles written in 2000 by Steven W. McDougall. My goal in modifying these articles is to simplify and update them. I hope you find these adapted versions of the articles useful. You can find the original articles at the link at the bottom of this page. The major changes in this update are:
- h2xs is not used;
- XSLoader is used in place of DynaLoader;
- It is assumed that the reader understands the basic concepts of C and Perl programming.
This adaptation is a work in progress and many of the links on these pages may not work.
XS Mechanics by Steven W. McDougall is licensed under a Creative Commons Attribution 3.0 Unported License.