Regular expressions to match C grammar

This page discusses regular expressions for parsing various kinds of C grammar.

The following Perl regular expression matches the traditional-style C comments like /* this */ or

our $trad_comment_re = qr!
                                # Match "not an asterisk"
                                # Match multiple asterisks followed
                                # by anything except an asterisk or a
                                # slash.
                            # Match multiple asterisks followed by a
                            # slash.


Matching the C++-style comments is easier:

our $cxx_comment_re = qr!//.*\n!;


The following regular expression matches a C preprocessor instruction:

our $cpp_re = qr/^\h*


The following regular expressions match a single C string, like "this", and compound C strings, like "this" "one":

our $single_string_re = qr/


our $string_re = qr/$single_string_re(?:\s*$single_string_re)*/;


The following regular expressions match one-character C operators and all C operators respectively.

our $one_char_op_re = qr/(?:\%|\&|\+|\-|\=|\/|\||\.|\*|\:|>|<|\!|\?|~|\^)/;


our $operator_re = qr/
        # # Operators with two characters # 
        # Operators with one or two characters
        # followed by an equals sign.


All of these regular expressions are supplied in the Perl CPAN module C::Tokenize.

Copyright © Ben Bullock 2009-2023. All rights reserved. For comments, questions, and corrections, please email Ben Bullock ( or use the discussion group at Google Groups. / Privacy / Disclaimer