Using Unicode with Perl XS

Table of Unicode-related XS functions

UTF8SKIP utf8.h Assuming that buf is a pointer to the first byte of a UTF-8 encoded string, this takes the first character of char * buf and works out how many bytes the encoded character will take up. Uses a table called PL_utf8skip also defined in utf8.h.
SvUTF8 sv.h Checks if the flag SVf_UTF8 is set in a sv (scalar value) structure.
is_utf8_string utf8.c True/false function called in the form
is_utf8_string (text, length)
which checks if a string is correctly encoded as UTF-8.
SvUTF_on sv.h This C macro turns on the flag in the SV (scalar value) which says that the string is Unicode. It does not check whether the scalar value contains UTF-8 or not, so before using this it is necessary to use is_utf8_string to check whether the bytes actually are UTF-8 encoded or not.
SvUTF_off sv.h This is like SvUTF_on.

Web links

Copyright © Ben Bullock 2009-2023. All rights reserved. For comments, questions, and corrections, please email Ben Bullock ( or use the discussion group at Google Groups. / Privacy / Disclaimer