
Regex for PCRE

Description:
Regex for PCRE is a C++ wrapper around the PCRE library written by Philip Hazel (ph10@cam.ac.uk).  It greatly reduces the syntactic complexity of PCRE and tries to very closely simulate the behaviour of perl.  Both substitution (s/foo/bar/) and split (split /\s+\, $string, -1) are implemented, along with global matching.
License Information:

Regex for PCRE is distributed under the same licensing as PCRE.  The general idea is that you either have to distribute it under the GPL, LGPL, or give proper credit to the use and source of this software package.  See the included LICENSE file for a full description.

Copyright Notice:

Original PCRE wrapper written by:
(C) Hugo Etchegoyen
    hetchego@hasar.com
    Buenos Aires, Argentina.

Current wrapper written by:
(C) Zachary Hansen
    xaxxon@slackworks.com
    

Basic API Instructions:

The basic API into PCRE for Regex is very straight forward.  To create a Regex object:

Regex MyRegex ( "([a-z]+)", "" );

To set modifiers on the regex, such as case insensitivity, put the following characters into the second string.  These characters correspond to the modifier characters following a regex in Perl.

i	Case insensitivity.  (PCRE_CASELESS)
m	Multiline.  [^$] match begin and end of line not just end of string (PCRE_MULTILINE)
s	Dot all.  Period matches any character including newline (PCRE_DOTALL)
x	Extended regex.  Ignores whitespace in the regex that isn't backslashed or in a character class.  Makes # a comment character just like it is in perl (PCRE_EXTENDED)
U	Ungreedy.  Default behaviour for * and + becomes ungreedy and ? postfix changes them individually to greedy (PCRE_UNGREEDY)
g	Global.  Regex performs matches from the end of the previous match each time it is run (No PCRE equivalent)

The following declaration creates a Regex object with global matching, case-insensitivity, line-based anchor characters, and periods matching all characters including newlines.  

Regex MyRegex ( "([a-z]+), "gims" );
std::string MyString = "abc def";

Once a Regex object has been created, it may be used.  The most basic operation that can be done with a Regex is a match.

MyRegex.match ( MyString );

This match finds one backreference, "abc".  To access a backreference, use array subscript notation.  The number inside the brackets corresponds to the perl backreference number.  Subscript 0 returns the entire matched string.

The following prints out "abc\n" to standard out.

std::cout << MyRegex [ 1 ] << std::endl;

Because the "g" flag was used in the creation of MyRegex, if you call the match() method again, it
will continue at the point the previous match left off.  The following will print "def\n".

MyRegex.match ( MyString );
std::cout << MyRegex [ 1 ] << std::endl;

Substitution is another operation that can be done on a Regex object.

The following code doubles all letters, both capital and lower-case.  Without the 'g' modifier, it would only substitute the first letter.  Without the 'i' modifier, it would only substitute lower-case letter.  After running this code, MySubString contains "aaBBcc123".  Note that you can use backreferences inside the replacement string.

Regex MySubRegex ( "[a=z]", "gi" );
std::string MySubString ( "aBc123" );

MySubRegex.sub ( MySubString, "$1$1" );

Splitting is the third main operation that can be done with a Regex object.  

The following code will print out each of the following on seperate lines: "abc", "def", "ghi".

Regex MySplitRegex ( "|" );
std::string MySplitString ( "abc|def|ghi" );

int NumSplits = MySplitRegex.split ( MySplitString );
for ( int i = 1; i <= NumSplits; i++ ) {
    std::cout << MySplitRegex [ i ] << std::endl;
}


Full Documentation:

Full documentation will go here eventually, but 90% of the functionality is described previously in the "Basic API Documentation" section.

Contact Information:

Regex for PCRE URL: http://xaxxon.slackworks.com/Regex/index.html
Author contact info:
    email: xaxxon@slackworks.com