Perl and XS: Example 3: Set::Bit

Set::Bit represents a set of integers in the range 0..n. It allocates a bit for each integer. If the bit is one, then the integer is in the set; if the bit is zero then the integer is not in the set.

Set operations like unions and intersections consist of setting and clearing bits to add or remove integers from a set.

Set::Bit objects are represented using

typedef struct
{
    /* The range of the set is 0..n_bits - 1 */
    int n_bits;
    /* The number of bytes used for storage. */
    int n_chars;
    /* The bytes used for storage. */
    unsigned char * chars;
}
vector;

(download)

To create a Set::Bit object,

vector * new (int n_bits)
{
    vector * p = malloc (sizeof (vector));
    if (! p) {
	croak ("Out of memory");
    }
    p->n_bits = n_bits;

    /* We use one char to store the bits. The C standard promises that
       one byte contains at least eight bits. */

    p->n_chars = (n_bits + 8 - 1) / 8;

    p->chars = calloc (p->n_chars, sizeof (unsigned char));

    if (! p->chars) {
	croak ("Out of memory");
    }
    return p;
}

(download)

new allocates a vector, computes the number of bytes required, and allocates an array.

To add an integer to a set,

/* Set bit "n" in "p". */

void insert (vector *p, int n)
{
    int q;
    int r;

    if (n < 0 || n >= p->n_bits) {
	croak ("Bit out of range");
    }

    q = n / 8;
    r = n % 8;

    p->chars[q] |= 1 << r;
}

(download)

DESTROY() frees the storage.

void DESTROY (vector *p)
{
    free (p->chars);
    free (p);
}

(download)

These functions and declarations are gathered in files vector.c and vector.h

An xsub, A(), was used in example 2, but the Math::Ackermann object was an ordinary Perl array.

The C subroutines in vector.c will become xsubs in Set::Bit. We want the C-language vector struct to be the Set::Bit object.

In Makefile.PL, we include a line

'OBJECT'    => 'Bit.o vector.o'

in the argument list of WriteMakefile().

Make a file Bit.xs, and add a PROTOTYPES directive:

#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#include "ppport.h"
#include "vector.h"

MODULE = Set::Bit               PACKAGE = Set::Bit

PROTOTYPES: ENABLE

PROTOTYPES: ENABLE tells XS to create Perl prototypes for the functions. (Perl prototypes are a way of ensuring that functions are called in a specified way. They are described in perlsub in the Perl documentation.)

new

If the XS routine for new is

vector *
new(nBits)
	int nBits

then

$ perl Makefile.PL
$ make

results in

Error: 'vector *' not in typemap in Bit.xs, line 12

Make invokes xsubpp, which uses Perl's default type map to translate from C to Perl. The error message occurs because Perl's default type map does not contain an entry for vector *. However, the default typemap supplies an XS type called T_PTROBJ. To use T_PTROBJ, create a typemap file

vector *	T_PTROBJ

This tells xsubpp to use the code fragments for T_PTROBJ to translate between a vector * and a Perl scalar.

T_PTROBJ is for making pointers to C structures into Perl objects. Run make and we get a clean compile. But before we try to run our code, let's look at Bit.c and find out what xsubpp has done.

sv_setref_pv

Here is the glue routine emitted by xsubpp to connect new() to Perl

XS(XS_Set__Bit_new)
{
    dXSARGS;
    if (items != 1)
        croak("Usage: Set::Bit::new(nBits)");
    {
        int     nBits = (int)SvIV(ST(0));
        vector *RETVAL;

        RETVAL = new(nBits);
        ST(0) = sv_newmortal();
        sv_setref_pv(ST(0), "vectorPtr", (void*)RETVAL);
    }
    XSRETURN(1);
}

The glue routine extracts nBits from the Perl stack, and declares RETVAL as a vector *. Then it calls new(), and assigns the return value to RETVAL. Next, it creates a new "mortal" scalar at ST(0) on the Perl stack.

Now the glue routine calls sv_setref_pv to return RETVAL to Perl.

sv_setref_pv(ST(0), "vectorPtr", (void*)RETVAL);

Perlguts documents sv_setref_pv like this:

SV* sv_setref_pv(SV* rv, char* classname, PV iv);
Copies the pointer value (the address, not the string!) into an SV whose reference is rv. SV is blessed if classname is non-null.

SV stands for Scalar Value; an SV is the internal Perl data structure that represents a scalar. When we call sv_setref_pv(), it creates a new SV, sets the value of the SV to RETVAL, blesses the SV into the vectorPtr package, and makes the mortal at the first position on the stack, ST(0), a reference to the SV. The mortal at ST(0) is the return value of the new() method.

The package problem

This is very close to what we want. The return value from new() is a reference to a blessed scalar that holds a pointer to our C-language vector struct. However, the scalar is blessed into the vectorPtr package, not the Set::Bit package.

This is a bit odd. Perl methods that function as constructors customarily return references to objects that are blessed into the method's own package. The

MODULE = Set::Bit               PACKAGE = Set::Bit

line in Bit.xs causes all our xsubs to be installed in the Set::Bit package. But Set::Bit::new() returns a reference to an object that is blessed into the vectorPtr package. This means, for example, that we could write

$set = Set::Bit::new(100);

and get a reference to a valid Perl object, but we couldn't write

$set->insert(42);

because $$set is blessed into the vectorPtr package, and the vectorPtr package has no insert() method. The insert() method is in the Set::Bit package, along with the new() method.

There are various ways around this problem. We could leave new() in the Set::Bit package, and add another PACKAGE directive to install the rest of our xsubs in the vectorPtr package:

MODULE = Set::Bit               PACKAGE = vectorPtr

Or we could rename the whole module vectorPtr.

But these solutions are ugly and awkward. Module names shouldn't be dictated by the names of our C structs. Module names should be free parameters in our design, chosen to organize our libraries and to make our code read naturally. We very much want to get our object blessed into the Set::Bit package.

$ntype

As shown above, the code that blesses our object is

sv_setref_pv(ST(0), "vectorPtr", (void*)RETVAL);

To control the package that our objects are blessed into, we need to understand where the string "vectorPtr" comes from. If we look in the OUTPUT section of the default typemap, we find this entry

T_PTROBJ
        sv_setref_pv($arg, \"${ntype}\", (void*)$var);

So xsubpp emits code that blesses the object into the package named by $ntype. As discussed last month, $ntype is the type of the C variable that is being converted. In this case, the C variable is RETVAL, and its type is vector *.

vector * is an example of a C type name that is not a valid Perl package name. To accommodate this, xsubpp silently substitutes the string "Ptr" for the * before emitting $ntype as a package name. This is a general mechanism; for example, the C type name vector * * would be converted to the Perl package name vectorPtrPtr.

RETVAL

RETVAL is a C variable that xsubpp declares to hold the return value of our C subroutine; therefore, its type must match the return type of our C subroutine. xsubpp declares RETVAL with the type that we specify for our XS routine in Bit.xs. Thus, the XS code

vector *
new(nBits)
	int nBits

in Bit.xs leads to the C-language declaration

vector *        RETVAL;

in Bit.c.

This means that the return type of our XS routine determines the package that our object is blessed into. In order to bless our object into the Set::Bit package, we must write our XS routine like this

Set::Bit
new(nBits)
	int nBits

It seems that this cannot stand, because Set::Bit is not the return type of the C-language new() routine, and Set::Bit is not a valid C type name, but we can get around both problems.

xsubpp handles the second problem for us. It squashes colons in $ntype to underscores whenever it emits $ntype as the type of a C variable. So the XS code shown above leads to this declaration for RETVAL in Bit.c

Set__Bit        RETVAL;

which is valid C code.

To handle the first problem, we simply add a typedef to Bit.xs, like this

typedef vector *Set__Bit;

xsubpp passes this typedef through to Bit.c unchanged. Now the declaration of RETVAL is valid and correct.

xsubpp doesn't squash colons when it emits $ntype as a Perl package name, so the call to sv_setref_pv will be

sv_setref_pv(ST(0), "Set::Bit", (void*)RETVAL);

This will bless our object into the Set::Bit package, as we desire.

typemap revisited

After making these changes, Bit.xs looks like this

#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#include "vector.h"

typedef vector *Set__Bit;

MODULE = Set::Bit               PACKAGE = Set::Bit

PROTOTYPES: ENABLE

Set::Bit
new(nBits)
        int nBits

If we now run make, we get

Error: 'Set::Bit' not in typemap in Bit.xs, line 14

The problem is that xsubpp is trying to find the return type of our XS routine in the typemap. We made an entry in the local typemap for vector *, but then we changed the return type of our XS routine from vector * to Set::Bit. We need to change the entry in the local typemap accordingly

Set::Bit        T_PTROBJ

Method call syntax

With this change, we can run make

and get a clean compile. Now, we try to create a Set::Bit object. We add this line to test.pl

my $set = new Set::Bit 100;

and run make test

The output is

1..1
ok 1
Usage: Set::Bit::new(nBits) at test.pl line 21.

The ok 1 means that the module loaded successfully. The usage message means that we called the new() method with the wrong number of arguments.

With a little thought, we can identify the problem. We invoked new() with method call syntax. In Perl, a method call

new Set::Bit 100

ultimately becomes a subroutine call with the package name passed as the first argument

Set::Bit::new("Set::Bit", 100)

To accommodate this, we need to change our XS routine to take two arguments: a string and an integer

Set::Bit
new(package, nBits)
        char *package
        int   nBits

If we run xsubpp on this XS routine, it will emit C code to call new() with both arguments

RETVAL = new(package, nBits);

We could change our C routine to take both arguments, and then ignore the package name. However, I try to avoid polluting my interfaces with implementation artifacts. Instead, we add a CODE: directive to the XS routine, and write the call to new() ourselves.

CODE:
RETVAL = new(nBits);

When we use the CODE: directive, we also need an OUTPUT: directive to tell xsubpp what the return value is

OUTPUT:
RETVAL

Our XS routine is now

Set::Bit
new(package, nBits)
        char *package
        int   nBits
        CODE:
        RETVAL = new(nBits);
        OUTPUT:
        RETVAL

When we run make test,

we get a clean compile, and the output is

1..1
ok 1

showing that our call to new() succeeded.

The `Set::Bit` Object

Let's step back for a moment and review our situation. The glue routine for new() calls sv_setref_pv(), and sv_setref_pv() creates a Perl scalar. This scalar is the Set::Bit object. Here are three crucial facts regarding it

it has an integer value which is the machine address of a vector struct
it is blessed into the Set::Bit package
Set::Bit::new() returns a reference to it

Here is a picture that illustrates this

	$set = new Set::Bit 100;

	reference
	Name: $set
	RV: ----------->scalar
		     	Package: Set::Bit
		     	IV: ----------------->vector
			 		      {
					      }

RV stands for Reference Value. IV stands for Integer Value.

Because we have a reference to the Set::Bit object, we can make method calls on it. Whenever we make a method call, the reference is passed as the first argument to the method.

Methods may be written in either Perl or C. A method can dereference its first argument to get the value of the object. The value of a Set::Bit object is the machine address of a vector struct.

There isn't much that a Perl method can do with the address of a vector struct, except maybe print it.

sub Set::Bit::print_address
{
    my $set = shift;
    printf "%p", $$set;
}

A C method can pass the address of a vector struct to the subroutines defined in vector.c.

insert

The insert method inserts an integer into a set

$set = new Set::Bit 100;
$set->insert(42);

Here is the XS code to install insert() in the Set::Bit package

void
insert(pVector, n)
        Set::Bit pVector
        int      n

and here is the glue routine that xsubpp emits to call insert()

XS(XS_Set__Bit_insert)
{
    dXSARGS;
    if (items != 2)
        croak("Usage: Set::Bit::insert(pVector, n)");
    {
        Set__Bit        pVector;
        int     n = (int)SvIV(ST(1));

        if (sv_derived_from(ST(0), "Set::Bit")) {
            IV tmp = SvIV((SV*)SvRV(ST(0)));
            pVector = (Set__Bit) tmp;
        }
        else
            croak("pVector is not of type Set::Bit");

        insert(pVector, n);
    }
    XSRETURN_EMPTY;
}

Let's look at the C code to see how we get our vector * back from Perl.

The XS routine declares pVector to be of type Set::Bit. As before, xsubpp squashes the colons to underscores for us, emitting the C declaration

Set__Bit        pVector;

and the typedef in Bit.xs makes Set__Bit an alias for vector *.

The first argument to insert() is a reference to a Set::Bit object. This argument appears on the Perl stack at ST(0).

sv_derived_from() verifies that ST(0) is indeed a reference to a Set::Bit object. SvRV() dereferences ST(0), returning the scalar that was created earlier by sv_setref_pv(). SvIV() extracts the value of that scalar as an Integer Value (IV). This value is the vector * that was stored there by sv_setref_pv().

The glue routine stores the IV in tmp, and then assigns it to pVector on the next line, with a typecast to quiet the compiler. Finally, it calls the C-language insert() routine, passing it the vector * in pVector.

Instance methods

The XS code for the remaining instance methods is similar. Here is member()

int
member(pVector, n)
        Set::Bit pVector
        int      n

and intersect()

Set::Bit
intersect(pA, pB)
        Set::Bit pA
        Set::Bit pB

In each case, the first argument is a reference to the object on which the method was invoked. intersect() creates and returns a new Set::Bit object; the mechanics of this are the same as those described above for new().

union

The XS code for union() is a bit different. We can't have a C subroutine named union, because union is a keyword in C. The C subroutine is called Union, but so that the Perl interface will have a consistent naming scheme, the XS uses

Set::Bit
union(pA, pB)
        Set::Bit pA
        Set::Bit pB
        CODE:
        RETVAL = Union(pA, pB);
        OUTPUT:
        RETVAL

The XS routine is called union, with a CODE: directive so that we can write the C call to Union.

DESTROY

When the last reference to a Set::Bit object goes away, Perl deletes it. The object itself is a Perl scalar: Perl allocated for it in sv_setref_pv(), and Perl frees that memory.

The Set::Bit object holds a pointer to a vector struct. Perl doesn't know anything about the vector struct. We allocated it, and we are responsible for freeing it. We do this in the DESTROY method.

Perl calls DESTROY after the last reference to the Set::Bit object goes away, and before it deletes the Set::Bit object. Aside from the fact that Perl calls DESTROY() automatically, it is an ordinary method. Here is the XS code for DESTROY().

void
DESTROY(pVector)
        Set::Bit pVector

As with the other instance methods, the glue routine extracts the vector * from the Perl stack and passes it to the C-language DESTROY() routine, which frees the storage.

Perl methods

A Perl method can't operate on a vector *. However, accessor methods written in C can get instance data from an object, and then operate on that data in Perl.

member() is an accessor method: it returns true iff its argument is an element of the set. With the member() method, we can write other methods in Perl. For example, the elements() method returns a list of the elements of a set

sub elements
{
    my $set = shift;
    grep { $set->member($_) } 0..$set->top
}

and the print() method returns a string containing a print representation of a set

sub print
{
    my $set = shift;
    join ", ", $set->elements
}

Given a suitable set of accessor methods, the choice of whether to implement other methods in C or in Perl can be made on the basis of performance and convenience.

A Perl object

Earlier, I said that I wanted the Set::Bit object to be the C-language vector struct, rather than a Perl data object. It didn't work out that way. The Set::Bit object is indeed a Perl data object: it is the scalar created by sv_setref_pv().

The Set::Bit object gives the essential features of a C-language object. Data is represented in C, we can write methods in C, and methods written in C access instance data through a vector *, passed as the first argument. At the same time, the Set::Bit object gives us the flexibility to write methods in Perl.

Notes on these adapted articles

These pages are an adaptation of articles written in 2000 by Steven W. McDougall. My goal in modifying these articles is to simplify and update them. I hope you find these adapted versions of the articles useful. You can find the original articles at the link at the bottom of this page. The major changes in this update are:

h2xs is not used;
XSLoader is used in place of DynaLoader;
It is assumed that the reader understands the basic concepts of C and Perl programming.

This adaptation is a work in progress and many of the links on these pages may not work.

XS Mechanics by Steven W. McDougall is licensed under a Creative Commons Attribution 3.0 Unported License.

For comments, questions, and corrections, please email Ben Bullock (benkasminbullock@gmail.com). / Privacy / Disclaimer

Perl and XS: Example 3: Set::Bit

new

sv_setref_pv

The package problem

$ntype

RETVAL

typemap revisited

Method call syntax

The Set::Bit Object

insert

Instance methods

union

DESTROY

Perl methods

A Perl object

Notes on these adapted articles

The `Set::Bit` Object