Perl and XS: Example 3: Set::Bit 1. Introduction
2. Concepts
3. Example 1: Geometry
4. The typemap
5. Example 2: Math::Ackermann
6. Example 3: Set::Bit

`Set::Bit` represents a set of integers in the range `0..n`. It allocates a bit for each integer. If the bit is one, then the integer is in the set; if the bit is zero then the integer is not in the set.

Set operations like unions and intersections consist of setting and clearing bits to add or remove integers from a set.

`Set::Bit` objects are represented using

```typedef struct
{
/* The range of the set is 0..n_bits - 1 */
int n_bits;
/* The number of bytes used for storage. */
int n_chars;
/* The bytes used for storage. */
unsigned char * chars;
}
vector;```

To create a `Set::Bit` object,

```vector * new (int n_bits)
{
vector * p = malloc (sizeof (vector));
if (! p) {
croak ("Out of memory");
}
p->n_bits = n_bits;

/* We use one char to store the bits. The C standard promises that
one byte contains at least eight bits. */

p->n_chars = (n_bits + 8 - 1) / 8;

p->chars = calloc (p->n_chars, sizeof (unsigned char));

if (! p->chars) {
croak ("Out of memory");
}
return p;
}```

`new` allocates a `vector`, computes the number of bytes required, and allocates an array.

To add an integer to a set,

```/* Set bit "n" in "p". */

void insert (vector *p, int n)
{
int q;
int r;

if (n < 0 || n >= p->n_bits) {
croak ("Bit out of range");
}

q = n / 8;
r = n % 8;

p->chars[q] |= 1 << r;
}```

`DESTROY()` frees the storage.

```void DESTROY (vector *p)
{
free (p->chars);
free (p);
}```

These functions and declarations are gathered in files `vector.c` and `vector.h`

An xsub, `A()`, was used in example 2, but the `Math::Ackermann` object was an ordinary Perl array.

The C subroutines in `vector.c` will become xsubs in `Set::Bit`. We want the C-language `vector` struct to be the `Set::Bit` object.

In `Makefile.PL`, we include a line

```'OBJECT'    => 'Bit.o vector.o'
```
in the argument list of `WriteMakefile()`.

Make a file `Bit.xs`, and add a `PROTOTYPES` directive:

```#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#include "ppport.h"
#include "vector.h"

MODULE = Set::Bit               PACKAGE = Set::Bit

PROTOTYPES: ENABLE
```
`PROTOTYPES: ENABLE` tells XS to create Perl prototypes for the functions. (Perl prototypes are a way of ensuring that functions are called in a specified way. They are described in perlsub in the Perl documentation.)

new

If the XS routine for `new` is

```vector *
new(nBits)
int nBits
```

then

```\$ perl Makefile.PL
\$ make
```
results in

```Error: 'vector *' not in typemap in Bit.xs, line 12
```

`Make` invokes `xsubpp`, which uses Perl's default type map to translate from C to Perl. The error message occurs because Perl's default type map does not contain an entry for `vector *`. However, the default typemap supplies an XS type called `T_PTROBJ`. To use `T_PTROBJ`, create a typemap file

```vector *	T_PTROBJ
```
This tells `xsubpp` to use the code fragments for `T_PTROBJ` to translate between a `vector *` and a Perl scalar.

`T_PTROBJ` is for making pointers to C structures into Perl objects. Run `make` and we get a clean compile. But before we try to run our code, let's look at `Bit.c` and find out what `xsubpp` has done.

sv_setref_pv

Here is the glue routine emitted by `xsubpp` to connect `new()` to Perl

```XS(XS_Set__Bit_new)
{
dXSARGS;
if (items != 1)
croak("Usage: Set::Bit::new(nBits)");
{
int     nBits = (int)SvIV(ST(0));
vector *RETVAL;

RETVAL = new(nBits);
ST(0) = sv_newmortal();
sv_setref_pv(ST(0), "vectorPtr", (void*)RETVAL);
}
XSRETURN(1);
}
```

The glue routine extracts `nBits` from the Perl stack, and declares `RETVAL` as a `vector *`. Then it calls `new()`, and assigns the return value to `RETVAL`. Next, it creates a new "mortal" scalar at `ST(0)` on the Perl stack.

Now the glue routine calls `sv_setref_pv` to return `RETVAL` to Perl.

```sv_setref_pv(ST(0), "vectorPtr", (void*)RETVAL);
```

Perlguts documents `sv_setref_pv` like this:

```SV* sv_setref_pv(SV* rv, char* classname, PV iv);
```
Copies the pointer value (the address, not the string!) into an SV whose reference is rv. SV is blessed if classname is non-null.

SV stands for Scalar Value; an SV is the internal Perl data structure that represents a scalar. When we call `sv_setref_pv()`, it creates a new SV, sets the value of the SV to `RETVAL`, blesses the SV into the `vectorPtr` package, and makes the mortal at the first position on the stack, `ST(0)`, a reference to the SV. The mortal at `ST(0)` is the return value of the `new()` method.

The package problem

This is very close to what we want. The return value from `new()` is a reference to a blessed scalar that holds a pointer to our C-language `vector` struct. However, the scalar is blessed into the `vectorPtr` package, not the `Set::Bit` package.

This is a bit odd. Perl methods that function as constructors customarily return references to objects that are blessed into the method's own package. The

```MODULE = Set::Bit               PACKAGE = Set::Bit
```

line in `Bit.xs` causes all our xsubs to be installed in the `Set::Bit` package. But `Set::Bit::new()` returns a reference to an object that is blessed into the `vectorPtr` package. This means, for example, that we could write

```\$set = Set::Bit::new(100);
```

and get a reference to a valid Perl object, but we couldn't write

```\$set->insert(42);
```

because `\$\$set` is blessed into the `vectorPtr` package, and the `vectorPtr` package has no `insert()` method. The `insert()` method is in the `Set::Bit` package, along with the `new()` method.

There are various ways around this problem. We could leave `new()` in the `Set::Bit` package, and add another `PACKAGE` directive to install the rest of our xsubs in the `vectorPtr` package:

```MODULE = Set::Bit               PACKAGE = vectorPtr
```

Or we could rename the whole module `vectorPtr`.

But these solutions are ugly and awkward. Module names shouldn't be dictated by the names of our C structs. Module names should be free parameters in our design, chosen to organize our libraries and to make our code read naturally. We very much want to get our object blessed into the `Set::Bit` package.

\$ntype

As shown above, the code that blesses our object is

```sv_setref_pv(ST(0), "vectorPtr", (void*)RETVAL);
```

To control the package that our objects are blessed into, we need to understand where the string `"vectorPtr"` comes from. If we look in the OUTPUT section of the default typemap, we find this entry

```T_PTROBJ
sv_setref_pv(\$arg, \"\${ntype}\", (void*)\$var);
```

So `xsubpp` emits code that blesses the object into the package named by `\$ntype`. As discussed last month, `\$ntype` is the type of the C variable that is being converted. In this case, the C variable is `RETVAL`, and its type is `vector *`.

`vector *` is an example of a C type name that is not a valid Perl package name. To accommodate this, `xsubpp` silently substitutes the string `"Ptr"` for the `*` before emitting `\$ntype` as a package name. This is a general mechanism; for example, the C type name `vector * *` would be converted to the Perl package name `vectorPtrPtr`.

RETVAL

`RETVAL` is a C variable that `xsubpp` declares to hold the return value of our C subroutine; therefore, its type must match the return type of our C subroutine. `xsubpp` declares `RETVAL` with the type that we specify for our XS routine in `Bit.xs`. Thus, the XS code

```vector *
new(nBits)
int nBits
```

in `Bit.xs` leads to the C-language declaration

```vector *        RETVAL;
```

in `Bit.c`.

This means that the return type of our XS routine determines the package that our object is blessed into. In order to bless our object into the `Set::Bit` package, we must write our XS routine like this

```Set::Bit
new(nBits)
int nBits
```

It seems that this cannot stand, because `Set::Bit` is not the return type of the C-language `new()` routine, and `Set::Bit` is not a valid C type name, but we can get around both problems.

`xsubpp` handles the second problem for us. It squashes colons in `\$ntype` to underscores whenever it emits `\$ntype` as the type of a C variable. So the XS code shown above leads to this declaration for `RETVAL` in `Bit.c`

```Set__Bit        RETVAL;
```

which is valid C code.

To handle the first problem, we simply add a `typedef` to `Bit.xs`, like this

```typedef vector *Set__Bit;
```

`xsubpp` passes this `typedef` through to `Bit.c` unchanged. Now the declaration of `RETVAL` is valid and correct.

`xsubpp` doesn't squash colons when it emits `\$ntype` as a Perl package name, so the call to `sv_setref_pv` will be

```sv_setref_pv(ST(0), "Set::Bit", (void*)RETVAL);
```

This will bless our object into the `Set::Bit` package, as we desire.

typemap revisited

After making these changes, `Bit.xs` looks like this

```#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#include "vector.h"

typedef vector *Set__Bit;

MODULE = Set::Bit               PACKAGE = Set::Bit

PROTOTYPES: ENABLE

Set::Bit
new(nBits)
int nBits
```

If we now run `make`, we get

```Error: 'Set::Bit' not in typemap in Bit.xs, line 14
```

The problem is that `xsubpp` is trying to find the return type of our XS routine in the typemap. We made an entry in the local typemap for `vector *`, but then we changed the return type of our XS routine from `vector *` to `Set::Bit`. We need to change the entry in the local typemap accordingly

```Set::Bit        T_PTROBJ
```

Method call syntax

With this change, we can run `make`

and get a clean compile. Now, we try to create a `Set::Bit` object. We add this line to `test.pl`

```my \$set = new Set::Bit 100;
```

and run `make test`

The output is

```1..1
ok 1
Usage: Set::Bit::new(nBits) at test.pl line 21.
```

The `ok 1` means that the module loaded successfully. The usage message means that we called the `new()` method with the wrong number of arguments.

With a little thought, we can identify the problem. We invoked `new()` with method call syntax. In Perl, a method call

```new Set::Bit 100
```

ultimately becomes a subroutine call with the package name passed as the first argument

```Set::Bit::new("Set::Bit", 100)
```

To accommodate this, we need to change our XS routine to take two arguments: a string and an integer

```Set::Bit
new(package, nBits)
char *package
int   nBits
```

If we run `xsubpp` on this XS routine, it will emit C code to call `new()` with both arguments

```RETVAL = new(package, nBits);
```

We could change our C routine to take both arguments, and then ignore the package name. However, I try to avoid polluting my interfaces with implementation artifacts. Instead, we add a `CODE:` directive to the XS routine, and write the call to `new()` ourselves.

```CODE:
RETVAL = new(nBits);
```

When we use the `CODE:` directive, we also need an `OUTPUT:` directive to tell `xsubpp` what the return value is

```OUTPUT:
RETVAL
```

Our XS routine is now

```Set::Bit
new(package, nBits)
char *package
int   nBits
CODE:
RETVAL = new(nBits);
OUTPUT:
RETVAL
```

When we run `make test`,

we get a clean compile, and the output is

```1..1
ok 1
```

showing that our call to `new()` succeeded.

The `Set::Bit` Object

Let's step back for a moment and review our situation. The glue routine for `new()` calls `sv_setref_pv()`, and `sv_setref_pv()` creates a Perl scalar. This scalar is the `Set::Bit` object. Here are three crucial facts regarding it
• it has an integer value which is the machine address of a `vector` struct
• it is blessed into the `Set::Bit` package
• `Set::Bit::new()` returns a reference to it

Here is a picture that illustrates this

```	\$set = new Set::Bit 100;

reference
Name: \$set
RV: ----------->scalar
Package: Set::Bit
IV: ----------------->vector
{
}
```

`RV` stands for Reference Value. `IV` stands for Integer Value.

Because we have a reference to the `Set::Bit` object, we can make method calls on it. Whenever we make a method call, the reference is passed as the first argument to the method.

Methods may be written in either Perl or C. A method can dereference its first argument to get the value of the object. The value of a `Set::Bit` object is the machine address of a `vector` struct.

There isn't much that a Perl method can do with the address of a `vector` struct, except maybe print it.

```sub Set::Bit::print_address
{
my \$set = shift;
printf "%p", \$\$set;
}
```

A C method can pass the address of a `vector` struct to the subroutines defined in `vector.c`.

insert

The `insert` method inserts an integer into a set

```\$set = new Set::Bit 100;
\$set->insert(42);
```

Here is the XS code to install `insert()` in the `Set::Bit` package

```void
insert(pVector, n)
Set::Bit pVector
int      n
```

and here is the glue routine that `xsubpp` emits to call `insert()`

```XS(XS_Set__Bit_insert)
{
dXSARGS;
if (items != 2)
croak("Usage: Set::Bit::insert(pVector, n)");
{
Set__Bit        pVector;
int     n = (int)SvIV(ST(1));

if (sv_derived_from(ST(0), "Set::Bit")) {
IV tmp = SvIV((SV*)SvRV(ST(0)));
pVector = (Set__Bit) tmp;
}
else
croak("pVector is not of type Set::Bit");

insert(pVector, n);
}
XSRETURN_EMPTY;
}
```

Let's look at the C code to see how we get our `vector *` back from Perl.

The XS routine declares `pVector` to be of type `Set::Bit`. As before, `xsubpp` squashes the colons to underscores for us, emitting the C declaration

```Set__Bit        pVector;
```

and the typedef in `Bit.xs` makes `Set__Bit` an alias for `vector *`.

The first argument to `insert()` is a reference to a `Set::Bit` object. This argument appears on the Perl stack at `ST(0)`.

`sv_derived_from()` verifies that `ST(0)` is indeed a reference to a `Set::Bit` object. `SvRV()` dereferences `ST(0)`, returning the scalar that was created earlier by `sv_setref_pv()`. `SvIV()` extracts the value of that scalar as an Integer Value (IV). This value is the `vector *` that was stored there by `sv_setref_pv()`.

The glue routine stores the IV in `tmp`, and then assigns it to `pVector` on the next line, with a typecast to quiet the compiler. Finally, it calls the C-language `insert()` routine, passing it the `vector *` in `pVector`.

Instance methods

The XS code for the remaining instance methods is similar. Here is `member()`

```int
member(pVector, n)
Set::Bit pVector
int      n
```

and `intersect()`

```Set::Bit
intersect(pA, pB)
Set::Bit pA
Set::Bit pB
```

In each case, the first argument is a reference to the object on which the method was invoked. `intersect()` creates and returns a new `Set::Bit` object; the mechanics of this are the same as those described above for `new()`.

union

The XS code for `union()` is a bit different. We can't have a C subroutine named `union`, because `union` is a keyword in C. The C subroutine is called `Union`, but so that the Perl interface will have a consistent naming scheme, the XS uses

```Set::Bit
union(pA, pB)
Set::Bit pA
Set::Bit pB
CODE:
RETVAL = Union(pA, pB);
OUTPUT:
RETVAL
```

The XS routine is called `union`, with a `CODE:` directive so that we can write the C call to `Union`.

DESTROY

When the last reference to a `Set::Bit` object goes away, Perl deletes it. The object itself is a Perl scalar: Perl allocated for it in `sv_setref_pv()`, and Perl frees that memory.

The `Set::Bit` object holds a pointer to a `vector` struct. Perl doesn't know anything about the `vector` struct. We allocated it, and we are responsible for freeing it. We do this in the `DESTROY` method.

Perl calls `DESTROY` after the last reference to the `Set::Bit` object goes away, and before it deletes the `Set::Bit` object. Aside from the fact that Perl calls `DESTROY()` automatically, it is an ordinary method. Here is the XS code for `DESTROY()`.

```void
DESTROY(pVector)
Set::Bit pVector
```

As with the other instance methods, the glue routine extracts the `vector *` from the Perl stack and passes it to the C-language `DESTROY()` routine, which frees the storage.

Perl methods

A Perl method can't operate on a `vector *`. However, accessor methods written in C can get instance data from an object, and then operate on that data in Perl.

`member()` is an accessor method: it returns true iff its argument is an element of the set. With the `member()` method, we can write other methods in Perl. For example, the `elements()` method returns a list of the elements of a set

```sub elements
{
my \$set = shift;
grep { \$set->member(\$_) } 0..\$set->top
}
```

and the `print()` method returns a string containing a print representation of a set

```sub print
{
my \$set = shift;
join ", ", \$set->elements
}
```

Given a suitable set of accessor methods, the choice of whether to implement other methods in C or in Perl can be made on the basis of performance and convenience.

A Perl object

Earlier, I said that I wanted the `Set::Bit` object to be the C-language `vector` struct, rather than a Perl data object. It didn't work out that way. The `Set::Bit` object is indeed a Perl data object: it is the scalar created by `sv_setref_pv()`.

The `Set::Bit` object gives the essential features of a C-language object. Data is represented in C, we can write methods in C, and methods written in C access instance data through a `vector *`, passed as the first argument. At the same time, the `Set::Bit` object gives us the flexibility to write methods in Perl. 1. Introduction
2. Concepts
3. Example 1: Geometry
4. The typemap
5. Example 2: Math::Ackermann
6. Example 3: Set::Bit