Click on the banner to return to the Class Reference home page.
©Copyright 1996 Rogue Wave Software
#include <rw/re.h>
RWCRExpr re(".*\\.doc"); // Matches filename with suffix ".doc"
Class RWCRExpr represents an extended regular expression such as those found in lex and awk. The constructor "compiles" the expression into a form that can be used more efficiently. The results can then be used for string searches using class RWCString. Regular expressions can be of arbitrary size, limited by memory. The extended regular expression features found here are a subset of those found in the POSIX.2 standard (ANSI/IEEE Std 1003.2, ISO/IEC 9945-2).
Note: RWCRExpr is available only if your compiler supports exception handling and the C++ Standard Library.
The regular expression (RE) is constructed as follows:
The following rules determine one-character REs that match a single character:
Any character that is not a special character (to be defined) matches itself.
A backslash (\) followed by any special character matches the literal character itself; that is, this "escapes" the special character.
The "special characters" are:
+ * ? . [ ] ^ $ ( ) { } | \
The period (.) matches any character. E.g., ".umpty" matches either "Humpty" or "Dumpty."
A set of characters enclosed in brackets ([ ]) is a one-character RE that matches any of the characters in that set. E.g., "[akm]" matches either an "a", "k", or "m". A range of characters can be indicated with a dash. E.g., "[a-z]" matches any lower-case letter. However, if the first character of the set is the caret (^), then the RE matches any character except those in the set. It does not match the empty string. Example: [^akm] matches any character except "a", "k", or "m". The caret loses its special meaning if it is not the first character of the set. The following rules can be used to build a multicharacter RE:
Parentheses (( )) group parts of regular expressions together into subexpressions that can be treated as a single unit. For example, (ha)+ matches one or more "ha"'s.
A one-character RE followed by an asterisk (*) matches zero or more occurrences of the RE. Hence, [a-z]* matches zero or more lower-case characters.
A one-character RE followed by a plus (+) matches one or more occurrences of the RE. Hence, [a-z]+ matches one or more lower-case characters.
A question mark (?) is an optional element. The preceeding RE can occur zero or once in the string -- no more. E.g. xy?z matches either xyz or xz.
The concatenation of REs is a RE that matches the corresponding concatenation of strings. E.g., [A-Z][a-z]* matches any capitalized word.
The OR character ( | ) allows a choice between two regular expressions. For example, jell(y|ies) matches either "jelly" or "jellies".
Braces ({ }) are reserved for future use.
All or part of the regular expression can be "anchored" to either the beginning or end of the string being searched:
If the caret (^) is at the beginning of the (sub)expression, then the matched string must be at the beginning of the string being searched.
If the dollar sign ($) is at the end of the (sub)expression, then the matched string must be at the end of the string being searched.
None
#include <rw/re.h> #include <rw/cstring.h> #include <rw/rstream.h> main(){ RWCString aString("Hark! Hark! the lark"); // A regular expression matching any lowercase word or end of a //word starting with "l": RWCRExpr re("l[a-z]*"); cout << aString(re) << endl; // Prints "lark" }
RWCRExpr(const char* pat); RWCRExpr(const RWCString& pat);
Construct a regular expression from the pattern given by pat. The status of the results can be found by using member function status().
RWCRExpr(const RWCRExpr& r);
Copy constructor. Uses value semantics -- self will be a copy of r.
RWCRExpr();
Default constructor. You must assign a pattern to the regular expression before you use it.
~RWCRExpr();
Destructor. Releases any allocated memory.
RWCRExpr& operator=(const RWCRExpr& r);
Recompiles self to pattern found in r.
RWCRExpr& operator=(const char* pat); RWCRExpr& operator=(const RWCString& pat);
Recompiles self to the pattern given by pat. The status of the results can be found by using member function status().
size_t index(const RWCString& str, size_t* len = NULL, size_t start=0) const;
Returns the index of the first instance in the string str that matches the regular expression compiled in self, or RW_NPOS if there is no such match. The search starts at index start. The length of the matching pattern is returned in the variable pointed to by len. If an invalid regular expression is used for the search, an exception of type RWInternalErr will be thrown. Note that this member function is relatively clumsy to use -- class RWCString offers a better interface to regular expression searches.
statusType status() const;
Returns the status of the regular expression:
statusType |
Meaning |
RWCRExpr::OK |
No errors |
RWCRExpr::NOT_SUPPORTED |
POSIX.2 feature not yet supported. |
RWCRExpr::NO_MATCH |
Tried to find a match but failed |
RWCRExpr::BAD_PATTERN |
Pattern was illegal |
RWCRExpr::BAD_COLLATING_ELEMENT |
Invalid collating element referenced |
RWCRExpr::BAD_CHAR_CLASS_TYPE |
Invalid character class type referenced |
RWCRExpr::TRAILING_BACKSLASH |
Trailing \ in pattern |
RWCRExpr::UNMATCHED_BRACKET |
[] imbalance |
RWCRExpr::UNMATCHED_PARENTHESIS |
() imbalance |
RWCRExpr::UNMATCHED_BRACE |
{} imbalance |
RWCRExpr::BAD_BRACE |
Content of {} invalid. |
RWCRExpr::BAD_CHAR_RANGE |
Invalid endpoint in [a-z] expression |
RWCRExpr::OUT_OF_MEMORY |
Out of memory |
RWCRExpr::BAD_REPEAT |
?,* or + not preceded by valid regular expression |