Perl and XS: Example 3: Set::Bit
- Introduction
- Concepts
- Example 1: Geometry
- The typemap
- Example 2: Math::Ackermann
- Example 3: Set::Bit
Set::Bit represents a set of integers in the
range 0..n. It allocates a bit for each integer. If the
bit is one, then the integer is in the set; if the bit is zero then
the integer is not in the set.
Set operations like unions and intersections consist of setting and clearing bits to add or remove integers from a set.
Set::Bit objects are represented using
typedef struct { /* The range of the set is 0..n_bits - 1 */ int n_bits; /* The number of bytes used for storage. */ int n_chars; /* The bytes used for storage. */ unsigned char * chars; } vector;
To create a Set::Bit object,
vector * new (int n_bits) { vector * p = malloc (sizeof (vector)); if (! p) { croak ("Out of memory"); } p->n_bits = n_bits; /* We use one char to store the bits. The C standard promises that one byte contains at least eight bits. */ p->n_chars = (n_bits + 8 - 1) / 8; p->chars = calloc (p->n_chars, sizeof (unsigned char)); if (! p->chars) { croak ("Out of memory"); } return p; }
new allocates a vector, computes the number
of bytes required, and allocates an array.
To add an integer to a set,
/* Set bit "n" in "p". */ void insert (vector *p, int n) { int q; int r; if (n < 0 || n >= p->n_bits) { croak ("Bit out of range"); } q = n / 8; r = n % 8; p->chars[q] |= 1 << r; }
DESTROY() frees the storage.
void DESTROY (vector *p) { free (p->chars); free (p); }
These functions and declarations are gathered in
files vector.c and vector.h
An xsub, A(), was used in
example 2, but
the Math::Ackermann object was an ordinary Perl array.
The C subroutines in vector.c will become xsubs in
Set::Bit. We want the C-language vector
struct to be the Set::Bit object.
In Makefile.PL, we include a line
'OBJECT' => 'Bit.o vector.o'in the argument list of
WriteMakefile().
Make a file Bit.xs, and add
a PROTOTYPES directive:
#include "EXTERN.h" #include "perl.h" #include "XSUB.h" #include "ppport.h" #include "vector.h" MODULE = Set::Bit PACKAGE = Set::Bit PROTOTYPES: ENABLE
PROTOTYPES: ENABLE tells XS to create Perl prototypes for the
functions. (Perl prototypes are a way of ensuring that functions are
called in a specified way. They are described in
perlsub in the Perl documentation.)
new
If the XS routine for new is
vector * new(nBits) int nBits
then
$ perl Makefile.PL $ makeresults in
Error: 'vector *' not in typemap in Bit.xs, line 12
Make invokes xsubpp, which uses Perl's
default type map to translate from C to Perl. The error message occurs
because Perl's default type map does not contain an entry
for vector *. However, the default typemap supplies
an XS type called T_PTROBJ. To use T_PTROBJ,
create a typemap file
vector * T_PTROBJThis tells
xsubpp to use the code fragments
for T_PTROBJ to translate between
a vector * and a Perl scalar.
T_PTROBJ is for making pointers to C structures into Perl
objects. Run make and we get a clean compile. But
before we try to run our code, let's look at Bit.c and
find out what xsubpp has done.
sv_setref_pv
Here is the glue routine emitted
by xsubpp to connect new() to Perl
XS(XS_Set__Bit_new)
{
dXSARGS;
if (items != 1)
croak("Usage: Set::Bit::new(nBits)");
{
int nBits = (int)SvIV(ST(0));
vector *RETVAL;
RETVAL = new(nBits);
ST(0) = sv_newmortal();
sv_setref_pv(ST(0), "vectorPtr", (void*)RETVAL);
}
XSRETURN(1);
}
The glue routine extracts nBits from the Perl stack, and
declares RETVAL as a vector *. Then it
calls new(), and assigns the return value
to RETVAL. Next, it creates a new "mortal" scalar
at ST(0) on the Perl stack.
Now the glue routine calls sv_setref_pv to
return RETVAL to Perl.
sv_setref_pv(ST(0), "vectorPtr", (void*)RETVAL);
Perlguts documents
sv_setref_pv like this:
SV* sv_setref_pv(SV* rv, char* classname, PV iv);Copies the pointer value (the address, not the string!) into an SV whose reference is rv. SV is blessed if classname is non-null.
SV stands for Scalar Value; an SV is the internal Perl data structure
that represents a scalar. When we call sv_setref_pv(), it
creates a new SV, sets the value of the SV to RETVAL,
blesses the SV into the vectorPtr package, and makes the
mortal at the first position on the stack, ST(0), a
reference to the SV. The mortal at ST(0) is the return
value of the new() method.
The package problem
This is very close to what we want. The return value
from new() is a reference to a blessed scalar that holds
a pointer to our C-language vector struct. However, the
scalar is blessed into the vectorPtr package, not
the Set::Bit package.
This is a bit odd. Perl methods that function as constructors customarily return references to objects that are blessed into the method's own package. The
MODULE = Set::Bit PACKAGE = Set::Bit
line in Bit.xs causes all our xsubs to be installed in the
Set::Bit package. But Set::Bit::new() returns a
reference to an object that is blessed into the vectorPtr
package. This means, for example, that we could write
$set = Set::Bit::new(100);
and get a reference to a valid Perl object, but we couldn't write
$set->insert(42);
because $$set is blessed into the vectorPtr package, and
the vectorPtr package has no insert() method. The
insert() method is in the Set::Bit package, along with
the new() method.
There are various ways around this problem. We could leave
new() in the Set::Bit package, and add another
PACKAGE directive to install the rest of our xsubs in the
vectorPtr package:
MODULE = Set::Bit PACKAGE = vectorPtr
Or we could rename the whole module vectorPtr.
But these solutions are ugly and awkward. Module names shouldn't be
dictated by the names of our C structs. Module names should be free
parameters in our design, chosen to organize our libraries and to make
our code read naturally. We very much want to get our object blessed
into the Set::Bit package.
$ntype
As shown above, the code that blesses our object is
sv_setref_pv(ST(0), "vectorPtr", (void*)RETVAL);
To control the package that our objects are blessed into, we need to
understand where the string "vectorPtr" comes from. If we look
in the OUTPUT section of the default typemap, we find this entry
T_PTROBJ
sv_setref_pv($arg, \"${ntype}\", (void*)$var);
So xsubpp emits code that blesses the object into the package
named by $ntype. As discussed last month, $ntype is the
type of the C variable that is being converted. In this case, the C
variable is RETVAL, and its type is vector *.
vector * is an example of a C type name that is not a valid
Perl package name. To accommodate this, xsubpp silently
substitutes the string "Ptr" for the * before emitting
$ntype as a package name. This is a general mechanism; for
example, the C type name vector * * would be converted to the
Perl package name vectorPtrPtr.
RETVAL
RETVAL is a C variable that xsubpp declares to hold the
return value of our C subroutine; therefore, its type must match the
return type of our C subroutine. xsubpp declares RETVAL
with the type that we specify for our XS routine in
Bit.xs. Thus, the XS code
vector * new(nBits) int nBits
in Bit.xs leads to the C-language declaration
vector * RETVAL;
in Bit.c.
This means that the return type of our XS routine determines the
package that our object is blessed into. In order to bless our object
into the Set::Bit package, we must write our XS routine like
this
Set::Bit new(nBits) int nBits
It seems that this cannot stand, because Set::Bit is not
the return type of the C-language new() routine,
and Set::Bit is not a valid C type name, but we can get
around both problems.
xsubpp handles the second problem for us. It squashes colons in
$ntype to underscores whenever it emits $ntype as the
type of a C variable. So the XS code shown above leads to this
declaration for RETVAL in Bit.c
Set__Bit RETVAL;
which is valid C code.
To handle the first problem, we simply add a typedef to Bit.xs, like this
typedef vector *Set__Bit;
xsubpp passes this typedef through to Bit.c unchanged. Now the declaration of RETVAL is valid and correct.
xsubpp doesn't squash colons when it emits $ntype as a Perl package name, so the call to sv_setref_pv will be
sv_setref_pv(ST(0), "Set::Bit", (void*)RETVAL);
This will bless our object into the Set::Bit package, as we desire.
typemap revisited
After making these changes,Bit.xs looks like this
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#include "vector.h"
typedef vector *Set__Bit;
MODULE = Set::Bit PACKAGE = Set::Bit
PROTOTYPES: ENABLE
Set::Bit
new(nBits)
int nBits
If we now run make, we get
Error: 'Set::Bit' not in typemap in Bit.xs, line 14
The problem is that xsubpp is trying to find the return type of our XS routine in the typemap. We made an entry in the local typemap for vector *, but then we changed the return type of our XS routine from vector * to Set::Bit. We need to change the entry in the local typemap accordingly
Set::Bit T_PTROBJ
Method call syntax
With this change, we can runmake
and get a clean compile. Now, we try to create a Set::Bit
object. We add this line to test.pl
my $set = new Set::Bit 100;
and run make test
The output is
1..1 ok 1 Usage: Set::Bit::new(nBits) at test.pl line 21.
The ok 1 means that the module loaded successfully. The usage
message means that we called the new() method with the wrong
number of arguments.
With a little thought, we can identify the problem. We invoked new() with method call syntax. In Perl, a method call
new Set::Bit 100
ultimately becomes a subroutine call with the package name passed as the first argument
Set::Bit::new("Set::Bit", 100)
To accommodate this, we need to change our XS routine to take two arguments: a string and an integer
Set::Bit
new(package, nBits)
char *package
int nBits
If we run xsubpp on this XS routine, it will emit C code to call new() with both arguments
RETVAL = new(package, nBits);
We could change our C routine to take both arguments, and then ignore the package name. However, I try to avoid polluting my interfaces with implementation artifacts. Instead, we add a CODE: directive to the XS routine, and write the call to new() ourselves.
CODE: RETVAL = new(nBits);
When we use the CODE: directive, we also need an OUTPUT: directive to tell xsubpp what the return value is
OUTPUT: RETVAL
Our XS routine is now
Set::Bit
new(package, nBits)
char *package
int nBits
CODE:
RETVAL = new(nBits);
OUTPUT:
RETVAL
When we run make test,
we get a clean compile, and the output is
1..1 ok 1
showing that our call to new() succeeded.
The Set::Bit Object
Let's step back for a moment and review our situation. The glue
routine for new() calls sv_setref_pv(), and
sv_setref_pv() creates a Perl scalar. This scalar is the Set::Bit object. Here are three crucial facts regarding it
- it has an integer value which is the machine address of a
vectorstruct - it is blessed into the
Set::Bitpackage Set::Bit::new()returns a reference to it
Here is a picture that illustrates this
$set = new Set::Bit 100;
reference
Name: $set
RV: ----------->scalar
Package: Set::Bit
IV: ----------------->vector
{
}
RV stands for Reference Value. IV stands for Integer Value.
Because we have a reference to the Set::Bit object, we can make method calls on it. Whenever we make a method call, the reference is passed as the first argument to the method.
Methods may be written in either Perl or C. A method can dereference its first argument to get the value of the object. The value of a Set::Bit object is the machine address of a vector struct.
There isn't much that a Perl method can do with the address of a vector struct, except maybe print it.
sub Set::Bit::print_address
{
my $set = shift;
printf "%p", $$set;
}
A C method can pass the address of a vector struct to the
subroutines defined in vector.c.
insert
Theinsert method inserts an integer into a set
$set = new Set::Bit 100; $set->insert(42);
Here is the XS code to install insert() in the Set::Bit package
void
insert(pVector, n)
Set::Bit pVector
int n
and here is the glue routine that xsubpp emits to call insert()
XS(XS_Set__Bit_insert)
{
dXSARGS;
if (items != 2)
croak("Usage: Set::Bit::insert(pVector, n)");
{
Set__Bit pVector;
int n = (int)SvIV(ST(1));
if (sv_derived_from(ST(0), "Set::Bit")) {
IV tmp = SvIV((SV*)SvRV(ST(0)));
pVector = (Set__Bit) tmp;
}
else
croak("pVector is not of type Set::Bit");
insert(pVector, n);
}
XSRETURN_EMPTY;
}
Let's look at the C code to see how we get our vector * back from Perl.
The XS routine declares pVector to be of type Set::Bit. As before, xsubpp squashes the colons to underscores for us, emitting the C declaration
Set__Bit pVector;
and the typedef in Bit.xs makes Set__Bit an alias for vector *.
The first argument to insert() is a reference to a Set::Bit object. This argument appears on the Perl stack at ST(0).
sv_derived_from() verifies that ST(0) is indeed a reference to a Set::Bit object. SvRV() dereferences ST(0), returning the scalar that was created earlier by sv_setref_pv(). SvIV() extracts the value of that scalar as an Integer Value (IV). This value is the vector * that was stored there by sv_setref_pv().
The glue routine stores the IV in tmp, and then assigns it to pVector on the next line, with a typecast to quiet the compiler. Finally, it calls the C-language insert() routine, passing it the vector * in pVector.
Instance methods
The XS code for the remaining instance methods is similar. Here is member()
int
member(pVector, n)
Set::Bit pVector
int n
and intersect()
Set::Bit
intersect(pA, pB)
Set::Bit pA
Set::Bit pB
In each case, the first argument is a reference to the object on which the method was invoked. intersect() creates and returns a new Set::Bit object; the mechanics of this are the same as those described above for new().
union
The XS code for union() is a bit different. We can't have
a C subroutine named union, because union is
a keyword in C. The C subroutine is called Union, but so
that the Perl interface will have a consistent naming scheme, the XS uses
Set::Bit
union(pA, pB)
Set::Bit pA
Set::Bit pB
CODE:
RETVAL = Union(pA, pB);
OUTPUT:
RETVAL
The XS routine is called union, with a CODE:
directive so that we can write the C call to Union.
DESTROY
When the last reference to a Set::Bit object goes away,
Perl deletes it. The object itself is a Perl scalar: Perl allocated
for it in sv_setref_pv(), and Perl frees that memory.
The Set::Bit object holds a pointer to
a vector struct. Perl doesn't know anything about
the vector struct. We allocated it, and we are
responsible for freeing it. We do this in the DESTROY
method.
Perl calls DESTROY after the last reference to
the Set::Bit object goes away, and before it deletes
the Set::Bit object. Aside from the fact that Perl
calls DESTROY() automatically, it is an ordinary
method. Here is the XS code for DESTROY().
void
DESTROY(pVector)
Set::Bit pVector
As with the other instance methods, the glue routine extracts the vector * from the Perl stack and passes it to the C-language DESTROY() routine, which frees the storage.
Perl methods
A Perl method can't operate on a vector *. However,
accessor methods written in C can get instance data from an object,
and then operate on that data in Perl.
member() is an accessor method: it returns true iff its argument is an element of the set. With the member() method, we can write other methods in Perl. For example, the elements() method returns a list of the elements of a set
sub elements
{
my $set = shift;
grep { $set->member($_) } 0..$set->top
}
and the print() method returns a string containing a print representation of a set
sub print
{
my $set = shift;
join ", ", $set->elements
}
Given a suitable set of accessor methods, the choice of whether to implement other methods in C or in Perl can be made on the basis of performance and convenience.
A Perl object
Earlier, I said that I wanted the Set::Bit object to be
the C-language vector struct, rather than a Perl data
object. It didn't work out that way. The Set::Bit object
is indeed a Perl data object: it is the scalar created
by sv_setref_pv().
The Set::Bit object gives the essential features of a
C-language object. Data is represented in C, we can write methods in
C, and methods written in C access instance data through
a vector *, passed as the first argument. At the
same time, the Set::Bit object gives us the flexibility
to write methods in Perl.
- Introduction
- Concepts
- Example 1: Geometry
- The typemap
- Example 2: Math::Ackermann
- Example 3: Set::Bit
Notes on these adapted articles
These pages are an adaptation of articles written in 2000 by Steven W. McDougall. My goal in modifying these articles is to simplify and update them. I hope you find these adapted versions of the articles useful. You can find the original articles at the link at the bottom of this page. The major changes in this update are:
- h2xs is not used;
- XSLoader is used in place of DynaLoader;
- It is assumed that the reader understands the basic concepts of C and Perl programming.
This adaptation is a work in progress and many of the links on these pages may not work.

XS Mechanics by Steven W. McDougall is licensed under a Creative Commons Attribution 3.0 Unported License.