Perl and XS: The typemap

This article discusses how xsubpp converts data between Perl and C. When the Perl interpreter calls a subroutine, it pushes a list of scalars onto the Perl stack. An xsub must get the scalars from the stack and convert them to C form before calling the C routine, then when the C routine has finished, the xsub must convert the C data to Perl scalars and put the scalars back on the stack. Xsubpp makes the C code to do these conversions.

Conversion between Perl and C data types is handled with macros and routines in the Perl C API, but the necessary operations vary, depending on the C data types and the direction of the conversion. Consider:

C data type	input	output
`int n`	`n = (int ) SvIV(ST(0))`	`sv_setiv( ST(0), (IV )n )`
`double x`	`x = (double) SvNV(ST(0))`	`sv_setnv( ST(0), (double)x )`
`char *psz`	`psz = (char *) SvPV(ST(0),na)`	`sv_setpv((SV*)ST(0), psz)`

We could imagine a big switch statement inside xsubpp to select the right code fragment for each C data type, but this would be clumsy and inflexible. It would be better to put the code fragments in a table, like the one shown above.

If we start writing such a table, we quickly discover that the mapping between Perl and C datatypes is not one-to-one. As a strongly typed language, C distinguishes more data types than Perl does. For example, these seven C integer types are all converted with essentially the same code fragment, the only variation being the typecast used to quiet the C compiler.

C data type	input	output
`int n`	`n = (int )SvIV(ST(0))`	`sv_setiv(ST(0), (IV)n)`
`unsigned n`	`n = (unsigned )SvIV(ST(0))`	`sv_setiv(ST(0), (IV)n)`
`unsigned int n`	`n = (unsigned int )SvIV(ST(0))`	`sv_setiv(ST(0), (IV)n)`
`long n`	`n = (long )SvIV(ST(0))`	`sv_setiv(ST(0), (IV)n)`
`unsigned long n`	`n = (unsigned long )SvIV(ST(0))`	`sv_setiv(ST(0), (IV)n)`
`short n`	`n = (short )SvIV(ST(0))`	`sv_setiv(ST(0), (IV)n)`
`unsigned short n`	`n = (unsigned short)SvIV(ST(0))`	`sv_setiv(ST(0), (IV)n)`

In view of this, xsubpp uses a two-level mapping. First, it maps C data types to XS types, like this

C data type	XS type
`int`	`T_IV`
`unsigned`	`T_IV`
`char`	`T_CHAR`
`char *`	`T_PV`

Then it maps the XS types to code fragments, in two tables: one for input

XS type	input code fragment
`T_IV`	`$var = ($ntype)SvIV($arg)`
`T_CHAR`	`$var = (char)*SvPV($arg,na)`
`T_PV`	`$var = ($ntype)SvPV($arg,na)`

and one for output

XS type	output code fragment
`T_IV`	`sv_setiv ($arg, (IV)$var);`
`T_CHAR`	`sv_setpvn($arg, (char *)&$var, 1);`
`T_PV`	`sv_setpv ((SV*)$arg, $var);`

These tables constitute the typemap.

The XS types are meaningful only to xsubpp, and appear only in the typemap. They do not appear in Perl code, XS code, or C code.

`$var`, `$ntype`, and `$arg`

The code fragments in the typemap are not pure C code: they contain Perl variables in their text. The variables are

$var: The name of a C variable
$ntype: The type of $var
$arg: Code to access a Perl scalar

xsubpp is a Perl program. When it needs to convert an argument from Perl to C, it sets $var, $ntype, and $arg, obtains the appropriate code fragment from the typemap, and evals the fragment to replace the Perl variables with their values.

For example, consider this XS routine

int
max(a, b)
	int a
	int b

To generate code to convert the first parameter from Perl to C, xsubpp sets the Perl variables like this

variable	value
`$var`	`a`
`$ntype`	`int`
`$arg`	`ST(0)`

Then, it evals the fragment

$var = ($ntype)SvIV($arg)

to yield the C code

a = (int)SvIV(ST(0))

It is important to understand how these variables work, because sometimes you have to arrange for them to have the right values in order to make xsubpp do what you want. The next article in this series contains an example in the XS code for Align::NW.

Typemap files

The three tables that constitute the typemap are referred to as TYPEMAP, INPUT, and OUTPUT, respectively. All three tables may be stored in a single file, with each table headed by its own name. Here is an example to illustrate the file format

# A typemap file

TYPEMAP
int			T_IV
SV *			T_SV

INPUT
T_SV
	$var = $arg
T_IV
	$var = ($ntype)SvIV($arg)

OUTPUT
T_SV
	$arg = $var;
T_IV
	sv_setiv($arg, (IV)$var);

The first TYPEMAP header may be omitted.

Files containing typemaps are conventionally named typemap. Xsubpp can read and aggregate multiple typemap files to construct the typemap. Entries in later files override entries in earlier files.

Perl supplies a default typemap in

/usr/local/lib/perl5/version/ExtUtils/typemap

XS modules may provide a local typemap file in the module directory. If the module declares structs or other C data types, it can map them to XS types in a TYPEMAP section. Local typemaps rarely need INPUT or OUTPUT sections; the default typemap almost always contains appropriate code fragments.

Notes on these adapted articles

These pages are an adaptation of articles written in 2000 by Steven W. McDougall. My goal in modifying these articles is to simplify and update them. I hope you find these adapted versions of the articles useful. You can find the original articles at the link at the bottom of this page. The major changes in this update are:

h2xs is not used;
XSLoader is used in place of DynaLoader;
It is assumed that the reader understands the basic concepts of C and Perl programming.

This adaptation is a work in progress and many of the links on these pages may not work.

XS Mechanics by Steven W. McDougall is licensed under a Creative Commons Attribution 3.0 Unported License.

For comments, questions, and corrections, please email Ben Bullock (benkasminbullock@gmail.com). / Privacy / Disclaimer

Perl and XS: The typemap

$var, $ntype, and $arg

Typemap files

Notes on these adapted articles

`$var`, `$ntype`, and `$arg`