Detect Shift-JIS bytes in a stream of bytes
This is a C routine which can be used to detect Shift-JIS bytes in a
stream of bytes, or as a converter from Shift-JIS to JIS. I made this
routine in order to find embedded Shift-JIS inside a
Macromedia .dxr
file on a CD-ROM.
It should be called as follows:
const unsigned char * buffer; int jis_first; int jis_second; if (shift_jis_to_jis (buffer, & jis_first, & jis_second)) { /* Do something */ }
If it succeeds, it returns 1 and sets the values
of jis_first
and jis_second
to the first and
second bytes. If it fails it returns 0.
#include "shift-jis-to-jis.h" //#define DEBUG int shift_jis_to_jis (const unsigned char * may_be_shift_jis, int * jis_first_ptr, int * jis_second_ptr) { int status = 0; unsigned char first = may_be_shift_jis[0]; unsigned char second = may_be_shift_jis[1]; int jis_first = 0; int jis_second = 0; /* Check first byte is valid shift JIS. */ #ifdef DEBUG printf (":%X%X\n", first, second); #endif if ((first >= 0x81 && first <= 0x84) || (first >= 0x87 && first <= 0x9f)) { jis_first = 2 * (first - 0x70) - 1; #ifdef DEBUG printf ("First choice: hex value is %X\n", jis_first); #endif if (second >= 0x40 && second <= 0x9e) { jis_second = second - 31; if (jis_second > 95) { jis_second -= 1; } #ifdef DEBUG printf ("Second is lower: hex value %X\n", jis_second); #endif status = 1; } else if (second >= 0x9f && second <= 0xfc) { jis_second = second - 126; jis_first += 1; status = 1; } else { #ifdef DEBUG printf ("Second byte not OK\n"); #endif } } else if (first >= 0xe0 && first <= 0xef) { #ifdef DEBUG printf ("Second choice\n"); #endif jis_first = 2 * (first - 0xb0) - 1; if (second >= 0x40 && second <= 0x9e) { #ifdef DEBUG printf ("Second is lower\n"); #endif jis_second = second - 31; if (jis_second > 95) { jis_second -= 1; } status = 1; } else if (second >= 0x9f && second <= 0xfc) { jis_second = second - 126; jis_first += 1; status = 1; } } else { #ifdef DEBUG printf ("Fail on first byte\n"); #endif } * jis_first_ptr = jis_first; * jis_second_ptr = jis_second; return status; }
Disclaimer: I made this to find out approximately where bytes representing Shift-JIS were in a binary file. The conversion into JIS bytes seems to work correctly, but I do not guarantee that it works in every case. Also, there are several things it will declare as valid Shift-JIS which actually aren't. In order to get a perfect validator, you need to also add a lot of details about valid JIS numbers. This subroutine is part of a larger program which scans a binary file looking for consecutive Shift-JIS bytes. I am putting this on the web because I couldn't find an equivalent simple C example of the conversion.