Detect Shift-JIS bytes in a stream of bytes

This is a C routine which can be used to detect Shift-JIS bytes in a stream of bytes, or as a converter from Shift-JIS to JIS. I made this routine in order to find embedded Shift-JIS inside a Macromedia .dxr file on a CD-ROM.

It should be called as follows:

const unsigned char * buffer;
int jis_first;
int jis_second;
if (shift_jis_to_jis (buffer, & jis_first, & jis_second)) {
    /* Do something */
}

If it succeeds, it returns 1 and sets the values of jis_first and jis_second to the first and second bytes. If it fails it returns 0.

#include "shift-jis-to-jis.h"

//#define DEBUG

int
shift_jis_to_jis (const unsigned char * may_be_shift_jis,
                  int * jis_first_ptr, int * jis_second_ptr)
{
    int status = 0;
    unsigned char first = may_be_shift_jis[0];
    unsigned char second = may_be_shift_jis[1];
    int jis_first = 0;
    int jis_second = 0;
    /* Check first byte is valid shift JIS. */
#ifdef DEBUG
    printf (":%X%X\n", first, second);
#endif
    if ((first >= 0x81 && first <= 0x84) || 
        (first >= 0x87 && first <= 0x9f)) {
        jis_first = 2 * (first - 0x70) - 1;
#ifdef DEBUG
        printf ("First choice: hex value is %X\n", jis_first);
#endif
        if (second >= 0x40 && second <= 0x9e) {
            jis_second = second - 31;
            if (jis_second > 95) {
                jis_second -= 1;
            }
#ifdef DEBUG
            printf ("Second is lower: hex value %X\n", jis_second);
#endif
            status = 1;
        }
        else if (second >= 0x9f && second <= 0xfc) {
            jis_second = second - 126;
            jis_first += 1;
            status = 1;
        }
        else {
#ifdef DEBUG
            printf ("Second byte not OK\n");
#endif
        }
    }
    else if (first >= 0xe0 && first <= 0xef) {
#ifdef DEBUG
        printf ("Second choice\n");
#endif
        jis_first = 2 * (first - 0xb0) - 1;
        if (second >= 0x40 && second <= 0x9e) {
#ifdef DEBUG
            printf ("Second is lower\n");
#endif
            jis_second = second - 31;
            if (jis_second > 95) {
                jis_second -= 1;
            }
            status = 1;
        }
        else if (second >= 0x9f && second <= 0xfc) {
            jis_second = second - 126;
            jis_first += 1;
            status = 1;
        }
    }
    else {
#ifdef DEBUG
        printf ("Fail on first byte\n");
#endif
    }
    * jis_first_ptr = jis_first;
    * jis_second_ptr = jis_second;
    return status;
}

Disclaimer: I made this to find out approximately where bytes representing Shift-JIS were in a binary file. The conversion into JIS bytes seems to work correctly, but I do not guarantee that it works in every case. Also, there are several things it will declare as valid Shift-JIS which actually aren't. In order to get a perfect validator, you need to also add a lot of details about valid JIS numbers. This subroutine is part of a larger program which scans a binary file looking for consecutive Shift-JIS bytes. I am putting this on the web because I couldn't find an equivalent simple C example of the conversion.


Copyright © Ben Bullock 2009-2023. All rights reserved. For comments, questions, and corrections, please email Ben Bullock (benkasminbullock@gmail.com) or use the discussion group at Google Groups. / Privacy / Disclaimer