Regular expressions to match C grammar

This page discusses regular expressions for parsing various kinds of C grammar.

The following Perl regular expression matches the traditional-style C comments like /* this */ or

/*
this
*/
our $trad_comment_re = qr!
    /\*
    (?:
                                # Match "not an asterisk"
                                [^*]
                            |
                                # Match multiple asterisks followed
                                # by anything except an asterisk or a
                                # slash.
                                \*+[^*/]
                            )*
                            # Match multiple asterisks followed by a
                            # slash.
                            \*+/
!x;

(download)

Matching the C++-style comments is easier:

our $cxx_comment_re = qr!//.*\n!;

(download)

The following regular expression matches a C preprocessor instruction:

our $cpp_re = qr/^\h*
                 \#
                 (?:
                     $trad_comment_re
                 |
                     [^\\\n]
                 |
                     \\[^\n]
                 |
                     \\\n
                 )+\n
/mx;

(download)

The following regular expressions match a single C string, like "this", and compound C strings, like "this" "one":

our $single_string_re = qr/
    (?:
        "
        (?:[^\\"]+|\\[^"]|\\")*
        "
    )
/x;

(download)

our $string_re = qr/$single_string_re(?:\s*$single_string_re)*/;

(download)

The following regular expressions match one-character C operators and all C operators respectively.

our $one_char_op_re = qr/(?:\%|\&|\+|\-|\=|\/|\||\.|\*|\:|>|<|\!|\?|~|\^)/;

(download)

our $operator_re = qr/
    (?:
        # # Operators with two characters # 
        \|\||&&|<<|>>|--|\+\+|->|==
    |
        # Operators with one or two characters
        # followed by an equals sign.
        (?:<<|>>|\+|-|\*|\/|%|&|\||\^)
        =
    |
        $one_char_op_re
    )
/x;

(download)

All of these regular expressions are supplied in the Perl CPAN module C::Tokenize.


Copyright © Ben Bullock 2009-2023. All rights reserved. For comments, questions, and corrections, please email Ben Bullock (benkasminbullock@gmail.com) or use the discussion group at Google Groups. / Privacy / Disclaimer