Perl and XS: Example 3: Set::Bit

- Introduction
- Concepts
- Example 1: Geometry
- The typemap
- Example 2: Math::Ackermann
- Example 3: Set::Bit
Set::Bit
represents a set of integers in the
range 0..n
. It allocates a bit for each integer. If the
bit is one, then the integer is in the set; if the bit is zero then
the integer is not in the set.
Set operations like unions and intersections consist of setting and clearing bits to add or remove integers from a set.
Set::Bit
objects are represented using
typedef struct { /* The range of the set is 0..n_bits - 1 */ int n_bits; /* The number of bytes used for storage. */ int n_chars; /* The bytes used for storage. */ unsigned char * chars; } vector;
To create a Set::Bit
object,
vector * new (int n_bits) { vector * p = malloc (sizeof (vector)); if (! p) { croak ("Out of memory"); } p->n_bits = n_bits; /* We use one char to store the bits. The C standard promises that one byte contains at least eight bits. */ p->n_chars = (n_bits + 8 - 1) / 8; p->chars = calloc (p->n_chars, sizeof (unsigned char)); if (! p->chars) { croak ("Out of memory"); } return p; }
new
allocates a vector
, computes the number
of bytes required, and allocates an array.
To add an integer to a set,
/* Set bit "n" in "p". */ void insert (vector *p, int n) { int q; int r; if (n < 0 || n >= p->n_bits) { croak ("Bit out of range"); } q = n / 8; r = n % 8; p->chars[q] |= 1 << r; }
DESTROY()
frees the storage.
void DESTROY (vector *p) { free (p->chars); free (p); }
These functions and declarations are gathered in
files vector.c
and vector.h
An xsub, A()
, was used in
example 2, but
the Math::Ackermann
object was an ordinary Perl array.
The C subroutines in vector.c
will become xsubs in
Set::Bit
. We want the C-language vector
struct to be the Set::Bit
object.
In Makefile.PL
, we include a line
'OBJECT' => 'Bit.o vector.o'in the argument list of
WriteMakefile()
.
Make a file Bit.xs
, and add
a PROTOTYPES
directive:
#include "EXTERN.h" #include "perl.h" #include "XSUB.h" #include "ppport.h" #include "vector.h" MODULE = Set::Bit PACKAGE = Set::Bit PROTOTYPES: ENABLE
PROTOTYPES: ENABLE
tells XS to create Perl prototypes for the
functions. (Perl prototypes are a way of ensuring that functions are
called in a specified way. They are described in
perlsub in the Perl documentation.)
new
If the XS routine for new
is
vector * new(nBits) int nBits
then
$ perl Makefile.PL $ makeresults in
Error: 'vector *' not in typemap in Bit.xs, line 12
Make
invokes xsubpp
, which uses Perl's
default type map to translate from C to Perl. The error message occurs
because Perl's default type map does not contain an entry
for vector *
. However, the default typemap supplies
an XS type called T_PTROBJ
. To use T_PTROBJ
,
create a typemap file
vector * T_PTROBJThis tells
xsubpp
to use the code fragments
for T_PTROBJ
to translate between
a vector *
and a Perl scalar.
T_PTROBJ
is for making pointers to C structures into Perl
objects. Run make
and we get a clean compile. But
before we try to run our code, let's look at Bit.c
and
find out what xsubpp
has done.
sv_setref_pv
Here is the glue routine emitted
by xsubpp
to connect new()
to Perl
XS(XS_Set__Bit_new) { dXSARGS; if (items != 1) croak("Usage: Set::Bit::new(nBits)"); { int nBits = (int)SvIV(ST(0)); vector *RETVAL; RETVAL = new(nBits); ST(0) = sv_newmortal(); sv_setref_pv(ST(0), "vectorPtr", (void*)RETVAL); } XSRETURN(1); }
The glue routine extracts nBits
from the Perl stack, and
declares RETVAL
as a vector *
. Then it
calls new()
, and assigns the return value
to RETVAL
. Next, it creates a new "mortal" scalar
at ST(0)
on the Perl stack.
Now the glue routine calls sv_setref_pv
to
return RETVAL
to Perl.
sv_setref_pv(ST(0), "vectorPtr", (void*)RETVAL);
Perlguts documents
sv_setref_pv
like this:
SV* sv_setref_pv(SV* rv, char* classname, PV iv);Copies the pointer value (the address, not the string!) into an SV whose reference is rv. SV is blessed if classname is non-null.
SV stands for Scalar Value; an SV is the internal Perl data structure
that represents a scalar. When we call sv_setref_pv()
, it
creates a new SV, sets the value of the SV to RETVAL
,
blesses the SV into the vectorPtr
package, and makes the
mortal at the first position on the stack, ST(0)
, a
reference to the SV. The mortal at ST(0)
is the return
value of the new()
method.
The package problem
This is very close to what we want. The return value
from new()
is a reference to a blessed scalar that holds
a pointer to our C-language vector
struct. However, the
scalar is blessed into the vectorPtr
package, not
the Set::Bit
package.
This is a bit odd. Perl methods that function as constructors customarily return references to objects that are blessed into the method's own package. The
MODULE = Set::Bit PACKAGE = Set::Bit
line in Bit.xs
causes all our xsubs to be installed in the
Set::Bit
package. But Set::Bit::new()
returns a
reference to an object that is blessed into the vectorPtr
package. This means, for example, that we could write
$set = Set::Bit::new(100);
and get a reference to a valid Perl object, but we couldn't write
$set->insert(42);
because $$set
is blessed into the vectorPtr
package, and
the vectorPtr
package has no insert()
method. The
insert()
method is in the Set::Bit
package, along with
the new()
method.
There are various ways around this problem. We could leave
new()
in the Set::Bit
package, and add another
PACKAGE
directive to install the rest of our xsubs in the
vectorPtr
package:
MODULE = Set::Bit PACKAGE = vectorPtr
Or we could rename the whole module vectorPtr
.
But these solutions are ugly and awkward. Module names shouldn't be
dictated by the names of our C structs. Module names should be free
parameters in our design, chosen to organize our libraries and to make
our code read naturally. We very much want to get our object blessed
into the Set::Bit
package.
$ntype
As shown above, the code that blesses our object is
sv_setref_pv(ST(0), "vectorPtr", (void*)RETVAL);
To control the package that our objects are blessed into, we need to
understand where the string "vectorPtr"
comes from. If we look
in the OUTPUT section of the default typemap, we find this entry
T_PTROBJ sv_setref_pv($arg, \"${ntype}\", (void*)$var);
So xsubpp
emits code that blesses the object into the package
named by $ntype
. As discussed last month, $ntype
is the
type of the C variable that is being converted. In this case, the C
variable is RETVAL
, and its type is vector *
.
vector *
is an example of a C type name that is not a valid
Perl package name. To accommodate this, xsubpp
silently
substitutes the string "Ptr"
for the *
before emitting
$ntype
as a package name. This is a general mechanism; for
example, the C type name vector * *
would be converted to the
Perl package name vectorPtrPtr
.
RETVAL
RETVAL
is a C variable that xsubpp
declares to hold the
return value of our C subroutine; therefore, its type must match the
return type of our C subroutine. xsubpp
declares RETVAL
with the type that we specify for our XS routine in
Bit.xs
. Thus, the XS code
vector * new(nBits) int nBits
in Bit.xs
leads to the C-language declaration
vector * RETVAL;
in Bit.c
.
This means that the return type of our XS routine determines the
package that our object is blessed into. In order to bless our object
into the Set::Bit
package, we must write our XS routine like
this
Set::Bit new(nBits) int nBits
It seems that this cannot stand, because Set::Bit
is not
the return type of the C-language new()
routine,
and Set::Bit
is not a valid C type name, but we can get
around both problems.
xsubpp
handles the second problem for us. It squashes colons in
$ntype
to underscores whenever it emits $ntype
as the
type of a C variable. So the XS code shown above leads to this
declaration for RETVAL
in Bit.c
Set__Bit RETVAL;
which is valid C code.
To handle the first problem, we simply add a typedef
to Bit.xs
, like this
typedef vector *Set__Bit;
xsubpp
passes this typedef
through to Bit.c
unchanged. Now the declaration of RETVAL
is valid and correct.
xsubpp
doesn't squash colons when it emits $ntype
as a Perl package name, so the call to sv_setref_pv
will be
sv_setref_pv(ST(0), "Set::Bit", (void*)RETVAL);
This will bless our object into the Set::Bit
package, as we desire.
typemap revisited
After making these changes,Bit.xs
looks like this
#include "EXTERN.h" #include "perl.h" #include "XSUB.h" #include "vector.h" typedef vector *Set__Bit; MODULE = Set::Bit PACKAGE = Set::Bit PROTOTYPES: ENABLE Set::Bit new(nBits) int nBits
If we now run make
, we get
Error: 'Set::Bit' not in typemap in Bit.xs, line 14
The problem is that xsubpp
is trying to find the return type of our XS routine in the typemap. We made an entry in the local typemap for vector *
, but then we changed the return type of our XS routine from vector *
to Set::Bit
. We need to change the entry in the local typemap accordingly
Set::Bit T_PTROBJ
Method call syntax
With this change, we can runmake
and get a clean compile. Now, we try to create a Set::Bit
object. We add this line to test.pl
my $set = new Set::Bit 100;
and run make test
The output is
1..1 ok 1 Usage: Set::Bit::new(nBits) at test.pl line 21.
The ok 1
means that the module loaded successfully. The usage
message means that we called the new()
method with the wrong
number of arguments.
With a little thought, we can identify the problem. We invoked new()
with method call syntax. In Perl, a method call
new Set::Bit 100
ultimately becomes a subroutine call with the package name passed as the first argument
Set::Bit::new("Set::Bit", 100)
To accommodate this, we need to change our XS routine to take two arguments: a string and an integer
Set::Bit new(package, nBits) char *package int nBits
If we run xsubpp
on this XS routine, it will emit C code to call new()
with both arguments
RETVAL = new(package, nBits);
We could change our C routine to take both arguments, and then ignore the package name. However, I try to avoid polluting my interfaces with implementation artifacts. Instead, we add a CODE:
directive to the XS routine, and write the call to new()
ourselves.
CODE: RETVAL = new(nBits);
When we use the CODE:
directive, we also need an OUTPUT:
directive to tell xsubpp
what the return value is
OUTPUT: RETVAL
Our XS routine is now
Set::Bit new(package, nBits) char *package int nBits CODE: RETVAL = new(nBits); OUTPUT: RETVAL
When we run make test
,
we get a clean compile, and the output is
1..1 ok 1
showing that our call to new()
succeeded.
The Set::Bit
Object
Let's step back for a moment and review our situation. The glue
routine for new()
calls sv_setref_pv()
, and
sv_setref_pv()
creates a Perl scalar. This scalar is the Set::Bit
object. Here are three crucial facts regarding it
- it has an integer value which is the machine address of a
vector
struct - it is blessed into the
Set::Bit
package Set::Bit::new()
returns a reference to it
Here is a picture that illustrates this
$set = new Set::Bit 100; reference Name: $set RV: ----------->scalar Package: Set::Bit IV: ----------------->vector { }
RV
stands for Reference Value. IV
stands for Integer Value.
Because we have a reference to the Set::Bit
object, we can make method calls on it. Whenever we make a method call, the reference is passed as the first argument to the method.
Methods may be written in either Perl or C. A method can dereference its first argument to get the value of the object. The value of a Set::Bit
object is the machine address of a vector
struct.
There isn't much that a Perl method can do with the address of a vector
struct, except maybe print it.
sub Set::Bit::print_address { my $set = shift; printf "%p", $$set; }
A C method can pass the address of a vector
struct to the
subroutines defined in vector.c
.
insert
Theinsert
method inserts an integer into a set
$set = new Set::Bit 100; $set->insert(42);
Here is the XS code to install insert()
in the Set::Bit
package
void insert(pVector, n) Set::Bit pVector int n
and here is the glue routine that xsubpp
emits to call insert()
XS(XS_Set__Bit_insert) { dXSARGS; if (items != 2) croak("Usage: Set::Bit::insert(pVector, n)"); { Set__Bit pVector; int n = (int)SvIV(ST(1)); if (sv_derived_from(ST(0), "Set::Bit")) { IV tmp = SvIV((SV*)SvRV(ST(0))); pVector = (Set__Bit) tmp; } else croak("pVector is not of type Set::Bit"); insert(pVector, n); } XSRETURN_EMPTY; }
Let's look at the C code to see how we get our vector *
back from Perl.
The XS routine declares pVector
to be of type Set::Bit
. As before, xsubpp
squashes the colons to underscores for us, emitting the C declaration
Set__Bit pVector;
and the typedef in Bit.xs
makes Set__Bit
an alias for vector *
.
The first argument to insert()
is a reference to a Set::Bit
object. This argument appears on the Perl stack at ST(0)
.
sv_derived_from()
verifies that ST(0)
is indeed a reference to a Set::Bit
object. SvRV()
dereferences ST(0)
, returning the scalar that was created earlier by sv_setref_pv()
. SvIV()
extracts the value of that scalar as an Integer Value (IV). This value is the vector *
that was stored there by sv_setref_pv()
.
The glue routine stores the IV in tmp
, and then assigns it to pVector
on the next line, with a typecast to quiet the compiler. Finally, it calls the C-language insert()
routine, passing it the vector *
in pVector
.
Instance methods
The XS code for the remaining instance methods is similar. Here is member()
int member(pVector, n) Set::Bit pVector int n
and intersect()
Set::Bit intersect(pA, pB) Set::Bit pA Set::Bit pB
In each case, the first argument is a reference to the object on which the method was invoked. intersect()
creates and returns a new Set::Bit
object; the mechanics of this are the same as those described above for new()
.
union
The XS code for union()
is a bit different. We can't have
a C subroutine named union
, because union
is
a keyword in C. The C subroutine is called Union
, but so
that the Perl interface will have a consistent naming scheme, the XS uses
Set::Bit union(pA, pB) Set::Bit pA Set::Bit pB CODE: RETVAL = Union(pA, pB); OUTPUT: RETVAL
The XS routine is called union
, with a CODE:
directive so that we can write the C call to Union
.
DESTROY
When the last reference to a Set::Bit
object goes away,
Perl deletes it. The object itself is a Perl scalar: Perl allocated
for it in sv_setref_pv()
, and Perl frees that memory.
The Set::Bit
object holds a pointer to
a vector
struct. Perl doesn't know anything about
the vector
struct. We allocated it, and we are
responsible for freeing it. We do this in the DESTROY
method.
Perl calls DESTROY
after the last reference to
the Set::Bit
object goes away, and before it deletes
the Set::Bit
object. Aside from the fact that Perl
calls DESTROY()
automatically, it is an ordinary
method. Here is the XS code for DESTROY()
.
void DESTROY(pVector) Set::Bit pVector
As with the other instance methods, the glue routine extracts the vector *
from the Perl stack and passes it to the C-language DESTROY()
routine, which frees the storage.
Perl methods
A Perl method can't operate on a vector *
. However,
accessor methods written in C can get instance data from an object,
and then operate on that data in Perl.
member()
is an accessor method: it returns true iff its argument is an element of the set. With the member()
method, we can write other methods in Perl. For example, the elements()
method returns a list of the elements of a set
sub elements { my $set = shift; grep { $set->member($_) } 0..$set->top }
and the print()
method returns a string containing a print representation of a set
sub print { my $set = shift; join ", ", $set->elements }
Given a suitable set of accessor methods, the choice of whether to implement other methods in C or in Perl can be made on the basis of performance and convenience.
A Perl object
Earlier, I said that I wanted the Set::Bit
object to be
the C-language vector
struct, rather than a Perl data
object. It didn't work out that way. The Set::Bit
object
is indeed a Perl data object: it is the scalar created
by sv_setref_pv()
.
The Set::Bit
object gives the essential features of a
C-language object. Data is represented in C, we can write methods in
C, and methods written in C access instance data through
a vector *
, passed as the first argument. At the
same time, the Set::Bit
object gives us the flexibility
to write methods in Perl.

- Introduction
- Concepts
- Example 1: Geometry
- The typemap
- Example 2: Math::Ackermann
- Example 3: Set::Bit
Notes on these adapted articles
These pages are an adaptation of articles written in 2000 by Steven W. McDougall. My goal in modifying these articles is to simplify and update them. I hope you find these adapted versions of the articles useful. You can find the original articles at the link at the bottom of this page. The major changes in this update are:
- h2xs is not used;
- XSLoader is used in place of DynaLoader;
- It is assumed that the reader understands the basic concepts of C and Perl programming.
This adaptation is a work in progress and many of the links on these pages may not work.
XS Mechanics by Steven W. McDougall is licensed under a Creative Commons Attribution 3.0 Unported License.