This page discusses regular expressions for parsing various kinds of C grammar.
The following Perl regular expression matches the traditional-style C
comments like /* this */ or
/* this */
our $trad_comment_re = qr! /\* (?: # Match "not an asterisk" [^*] | # Match multiple asterisks followed # by anything except an asterisk or a # slash. \*+[^*/] )* # Match multiple asterisks followed by a # slash. \*+/ !x;
Matching the C++-style comments is easier:
our $cxx_comment_re = qr!//.*\n!;
The following regular expression matches a C preprocessor instruction:
our $cpp_re = qr/^\h*\#(?: $trad_comment_re | [^\\\n] | \\[^\n] | \\\n )+\n /mx;
The following regular expressions match a single C string, like "this", and compound C strings, like "this" "one":
our $single_string_re = qr/ (?: " (?:[^\\"]+|\\[^"]|\\")* " ) /x;
our $string_re = qr/$single_string_re(?:\s*$single_string_re)*/;
The following regular expressions match one-character C operators and all C operators respectively.
our $one_char_op_re = qr/(?:\%|\&|\+|\-|\=|\/|\||\.|\*|\:|>|<|\!|\?|~|\^)/;
our $operator_re = qr/ (?: # Operators with two characters \|\||&&|<<|>>|--|\+\+|-> | # Operators with one or two characters # followed by an equals sign. (?:<<|>>|\+|-|\*|\/|%|&|\||\^) = | $one_char_op_re ) /x;
All of these regular expressions are supplied in the CPAN module C::Tokenize.
This page contains regular expressions for C words and reserved words.