Click on the banner to return to the Class Reference home page.
©Copyright 1996 Rogue Wave Software
#include <rw/cstring.h>
RWCString a;
Class RWCString offers very powerful and convenient facilities for manipulating strings that are just as efficient as the familiar standard C <string.h> functions.
Although the class is primarily intended to be used to handle single-byte character sets (SBCS; such as ASCII or ISO Latin-1), with care it can be used to handle multibyte character sets (MBCS). There are two things that must be kept in mind when working with MBCS:
Because characters can be more than one byte long, the number of bytes in a string can, in general, be greater than the number of characters in the string. Use function RWCString::length() to get the number of bytes in a string, function RWCString::mbLength() to get the number of characters. Note that the latter is much slower because it must determine the number of bytes in every character. Hence, if the string is known to be nothing but SBCS, then RWCString::length() is much to be preferred.
One or more bytes of a multibyte character can be zero. Hence, MBCS cannot be counted on being null terminated. In practice, it is a rare MBCS that uses embedded nulls. Nevertheless, you should be aware of this and program defensively. In any case, class RWCString can handle embedded nulls.
Parameters of type "const char*" must not be passed a value of zero. This is detected in the debug version of the library.
The class is implemented using a technique called copy on write. With this technique, the copy constructor and assignment operators still reference the old object and hence are very fast. An actual copy is made only when a "write" is performed, that is if the object is about to be changed. The net result is excellent performance, but with easy-to-understand copy semantics.
A separate class RWCSubString supports substring extraction and modification operations.
Simple
#include <rw/re.h> #include <rw/rstream.h> main(){ RWCString a("There is no joy in Beantown."); cout << a << endl << "becomes...." << endl; RWCRExpr re("[A-Z][a-z]*town"); // Any capitalized "town" a.replace(re, "Redmond"); cout << a << endl; }
Program output:
There is no joy in Redmond.
enum RWCString::caseCompare { exact, ignoreCase }
Used to specify whether comparisons, searches, and hashing functions should use case sensitive (exact) or case-insensitive (ignoreCase) semantics.
enum RWCString::scopeType { one, all }
Used to specify whether regular expression replace replaces the first one substring matched by the regular expression or replaces all substrings matched by the regular expression.
RWCString();
Creates a string of length zero (the null string).
RWCString(const char* cs);
Conversion from the null-terminated character string cs. The created string will copy the data pointed to by cs, up to the first terminating null. This function is incompatible with cs strings with embedded nulls. This function may be incompatible with cs MBCS strings.
RWCString(const char* cs, size_t N);
Constructs a string from the character string cs. The created string will copy the data pointed to by cs. Exactly N bytes are copied, including any embedded nulls. Hence, the buffer pointed to by cs must be at least N bytes long.
RWCString(RWSize_T ic);
Creates a string of length zero (the null string). The string's capacity (that is, the size it can grow to without resizing) is given by the parameter ic. We recommend creating an RWSize_T value from a numerical constant to pass into this constructor. While RWSize_T knows how to convert size_t's to itself, conforming compilers will chose the conversion to char instead.
RWCString(const RWCString& str);
Copy constructor. The created string will copy str's data.
RWCString(const RWCSubString& ss);
Conversion from sub-string. The created string will copy the substring represented by ss.
RWCString(char c);
Constructs a string containing the single character c.
RWCString(char c, size_t N);
Constructs a string containing the character c repeated N times.
operator const char*() const;
Access to the RWCString's data as a null terminated string. This data is owned by the RWCString and may not be deleted or changed. If the RWCString object itself changes or goes out of scope, the pointer value previously returned may (will!) become invalid. While the string is null-terminated, note that its length is still given by the member function length(). That is, it may contain embedded nulls.
RWCString& operator=(const char* cs);
Assignment operator. Copies the null-terminated character string pointed to by cs into self. Returns a reference to self. This function is incompatible with cs strings with embedded nulls. This function may be incompatible with cs MBCS strings.
RWCString& operator=(const RWCString& str);
Assignment operator. The string will copy str's data. Returns a reference to self.
RWCString& operator+=(const char* cs);
Append the null-terminated character string pointed to by cs to self. Returns a reference to self. This function is incompatible with cs strings with embedded nulls. This function may be incompatible with cs MBCS strings.
RWCString& operator+=(const RWCString& str);
Append the string str to self. Returns a reference to self.
char& operator[](size_t i); char operator[](size_t i) const;
Return the ith byte. The first variant can be used as an lvalue. The index i must be between 0 and the length of the string less one. Bounds checking is performed -- if the index is out of range then an exceptionof type RWBoundsErr will occur.
char& operator()(size_t i); char operator()(size_t i) const;
Return the ith byte. The first variant can be used as an lvalue. The index i must be between 0 and the length of the string less one. Bounds checking is performed if the pre-processor macro RWBOUNDS_CHECK has been defined before including <rw/cstring.h>. In this case, if the index is out of range, then an exception of type RWBoundsErr will occur.
RWCSubString operator()(size_t start, size_t len); const RWCSubString operator()(size_t start, size_t len) const;
Substring operator. Returns an RWCSubString of self with length len, starting at index start. The first variant can be used as an lvalue. The sum of start plus len must be less than or equal to the string length. If the library was built using the RWDEBUG flag, and start and len are out of range, then an exception of type RWBoundsErr will occur.
RWCSubString operator()(const RWCRExpr& re, size_t start=0); const RWCSubString operator()(const RWCRExpr& re, size_t start=0) const; RWCSubString operator()(const RWCRegexp& re, size_t start=0); const RWCSubString operator()(const RWCRegexp& re, size_t start=0) const;
Returns the first substring starting after index start that matches the regular expression re. If there is no such substring, then the null substring is returned. The first variant can be used as an lvalue.
Note that if you wish to use operator()(const RWCRExpr&...) you must instead use match(const RWCRExpr&...) described below. The reason for this is that we are presently retaining RWCRegexp but operator(const RWCRExpr&...) and operator(const RWCRegexp) are ambiguous in the case of RWCString::operator("string"). In addition, operator(const char *) and operator(size_t) are ambiguous in the case of RWCString::operator(0). This function maybe incompatible with strings with embedded nulls. This function is incompatible with MBCS strings.
RWCString& append(const char* cs);
Append a copy of the null-terminated character string pointed to by cs to self. Returns a reference to self. This function is incompatible with cs strings with embedded nulls. This function may be incompatible with cs MBCS strings.
RWCString& append(const char* cs, size_t N);
Append a copy of the character string cs to self. Exactly N bytes are copied, including any embedded nulls. Hence, the buffer pointed to by cs must be at least N bytes long. Returns a reference to self.
RWCString& append(char c, size_t N);
Append N copies of the character c to self. Returns a reference to self.
RWCString& append(const RWCString& cstr);
Append a copy of the string cstr to self. Returns a reference to self.
RWCString& append(const RWCString& cstr, size_t N);
Append the first N bytes or the length of cstr (whichever is less) of cstr to self. Returns a reference to self.
size_t binaryStoreSize() const;
Returns the number of bytes necessary to store the object using the global function:
RWFile& operator<<(RWFile&, const RWCString&);
size_t capacity() const;
Return the current capacity of self. This is the number of bytes the string can hold without resizing.
size_t capacity(size_t capac);
Hint to the implementation to change the capacity of self to capac. Returns the actual capacity.
int collate(const char* str) const; int collate(const RWCString& str) const;
Returns an int less then, greater than, or equal to zero, according to the result of calling the standard C library function ::strcoll() on self and the argument str. This supports locale-dependent collation. Provided only on platforms that provide ::strcoll(). This function is incompatible with strings with embedded nulls.
int compareTo(const char* str, caseCompare = RWCString::exact) const; int compareTo(const RWCString& str, caseCompare = RWCString::exact) const;
Returns an int less than, greater than, or equal to zero, according to the result of calling the standard C library function memcmp() on self and the argument str. Case sensitivity is according to the caseCompare argument, and may be RWCString::exact or RWCString::ignoreCase. If caseCompare is RWCString::exact, then this function works for all string types. Otherwise, this function is incompatible with MBCS strings. This function is incompatible with const char* strings with embedded nulls. This function may be incompatible with const char* MBCS strings.
RWBoolean contains(const char* str, caseCompare = RWCString::exact) const; RWBoolean contains(const RWCString& cs, caseCompare = RWCString::exact) const;
Pattern matching. Returns TRUE if str occurs in self. Case sensitivity is according to the caseCompare argument, and may be RWCString::exact or RWCString::ignoreCase. If caseCompare is RWCString::exact, then this function works for all string types. Otherwise, this function is incompatible with MBCS strings. This function is incompatible with const char* strings with embedded nulls. This function may be incompatible with const char* MBCS strings.
const char* data() const;
Access to the RWCString's data as a null terminated string. This datum is owned by the RWCString and may not be deleted or changed. If the RWCString object itself changes or goes out of scope, the pointer value previously returned will become invalid. While the string is null terminated, note that its length is still given by the member function length(). That is, it may contain embedded nulls.
size_t first(char c) const;
Returns the index of the first occurence of the character c in self. Returns RW_NPOS if there is no such character or if there is an embedded null prior to finding c. This function is incompatible with strings with embedded nulls. This function is incompatible with MBCS strings.
size_t first(char c, size_t) const;
Returns the index of the first occurence of the character c in self. Continues to search past embedded nulls. Returns RW_NPOS if there is no such character. This function is incompatible with MBCS strings.
size_t first(const char* str) const;
Returns the index of the first occurence in self of any character in str. Returns RW_NPOS if there is no match or if there is an embedded null prior to finding any character from str. This function is incompatible with strings with embedded nulls. This function may be incompatible with MBCS strings.
size_t first(const char* str, size_t N) const;
Returns the index of the first occurence in self of any character in str. Exactly N bytes in str are checked including any embedded nulls so str must point to a buffer containing at least N bytes. Returns RW_NPOS if there is no match.
unsigned hash(caseCompare = RWCString::exact) const;
Returns a suitable hash value. If caseCompare is RWCString::ignoreCase then this function will be incompatible with MBCS strings.
size_t index(const char* pat,size_t i=0, caseCompare = RWCString::exact) const; size_t index(const RWCString& pat,size_t i=0, caseCompare = RWCString::exact) const;
Pattern matching. Starting with index i, searches for the first occurrence of pat in self and returns the index of the start of the match. Returns RW_NPOS if there is no such pattern. Case sensitivity is according to the caseCompare argument; it defaults to RWCString::exact. If caseCompare is RWCString::exact, then this function works for all string types. Otherwise, this function is incompatible with MBCS strings.
size_t index(const char* pat, size_t patlen,size_t i, caseCompare cmp) const; size_t index(const RWCString& pat, size_t patlen,size_t i, caseCompare cmp) const;
Pattern matching. Starting with index i, searches for the first occurrence of the first patlen bytes from pat in self and returns the index of the start of the match. Returns RW_NPOS if there is no such pattern. Case sensitivity is according to the caseCompare argument. If caseCompare is RWCString::exact, then this function works for all string types. Otherwise, this function is incompatible with MBCS strings.
size_t index(const RWCRExpr& re, size_t i=0) const; size_t index(const RWCRegexp& re, size_t i=0) const;
Regular expression matching. Returns the index greater than or equal to i of the start of the first pattern that matches the regular expression re. Returns RW_NPOS if there is no such pattern. This function is incompatible with MBCS strings.
size_t index(const RWCRExpr& re,size_t* ext,size_t i=0) const; size_t index(const RWCRegexp& re,size_t* ext,size_t i=0) const;
Regular expression matching. Returns the index greater than or equal to i of the start of the first pattern that matches the regular expression re. Returns RW_NPOS if there is no such pattern. The length of the matching pattern is returned in the variable pointed to by ext. This function is incompatible with strings with embedded nulls. This function may be incompatible with MBCS strings.
RWCString& insert(size_t pos, const char* cs);
Insert a copy of the null-terminated string cs into self at byte position pos, thus expanding the string. Returns a reference to self. This function is incompatible with cs strings with embedded nulls. This function may be incompatible with cs MBCS strings.
RWCString& insert(size_t pos, const char* cs, size_t N);
Insert a copy of the first N bytes of cs into self at byte position pos, thus expanding the string. Exactly N bytes are copied, including any embedded nulls. Hence, the buffer pointed to by cs must be at least N bytes long. Returns a reference to self.
RWCString& insert(size_t pos, const RWCString& str);
Insert a copy of the string str into self at byte position pos. Returns a reference to self.
RWCString& insert(size_t pos, const RWCString& str, size_t N);
Insert a copy of the first N bytes or the length of str (whichever is less) of str into self at byte position pos. Returns a reference to self.
RWBoolean isAscii() const;
Returns TRUE if self contains no bytes with the high bit set.
RWBoolean isNull() const;
Returns TRUE if this is a zero lengthed string (i.e., the null string).
size_t last(char c) const;
Returns the index of the last occurrence in the string of the character c. Returns RW_NPOS if there is no such character or if there is an embedded null to the right of c in self. This function is incompatible with strings with embedded nulls. This function may be incompatible with MBCS strings.
size_t last(char c, size_t N) const;
Returns the index of the last occurrence in the string of the character c. Continues to search past embedded nulls. Returns RW_NPOS if there is no such character. This function is incompatible with MBCS strings.
size_t length() const;
Return the number of bytes in self. Note that if self contains multibyte characters, then this will not be the number of characters.
RWCSubString match(const RWCRExpr& re, size_t start=0); const RWCSubString match(const RWCRExpr& re, size_t start=0) const;
Returns the first substring starting after index start that matches the regular expression re. If there is no such substring, then the null substring is returned. The first variant can be used as an lvalue. Note that this is used in place of operator()(const RWCRegexp&...) if you want to use extended regular expressions.
size_t mbLength() const;
Return the number of multibyte characters in self, according to the Standard C function ::mblen(). Returns RW_NPOS if a bad character is encountered. Note that, in general, mbLength() _ length(). Provided only on platforms that provide ::mblen().
RWCString& prepend(const char* cs);
Prepend a copy of the null-terminated character string pointed to by cs to self. Returns a reference to self. This function is incompatible with cs strings with embedded nulls. This function may be incompatible with cs MBCS strings.
RWCString& prepend(const char* cs, size_t N);
Prepend a copy of the character string cs to self. Exactly N bytes are copied, including any embedded nulls. Hence, the buffer pointed to by cs must be at least N bytes long. Returns a reference to self.
RWCString& prepend(char c, size_t N);
Prepend N copies of character c to self. Returns a reference to self.
RWCString& prepend(const RWCString& str);
Prepends a copy of the string str to self. Returns a reference to self.
RWCString& prepend(const RWCString& cstr, size_t N);
Prepend the first N bytes or the length of cstr (whichever is less) of cstr to self. Returns a reference to self.
istream& readFile(istream& s);
Reads characters from the input stream s, replacing the previous contents of self, until EOF is reached. Null characters are treated the same as other characters.
istream& readLine(istream& s, RWBoolean skipWhite = TRUE);
Reads characters from the input stream s, replacing the previous contents of self, until a newline (or an EOF) is encountered. The newline is removed from the input stream but is not stored. Null characters are treated the same as other characters. If the skipWhite argument is TRUE, then whitespace is skipped (using the iostream library manipulator ws) before saving characters.
istream& readString(istream& s);
Reads characters from the input stream s, replacing the previous contents of self, until an EOF or null terminator is encountered. If the number of bytes remaining in the stream is large, you should resize the RWCString to approximately the number of bytes to be read prior to using this method. See "Implementation Details" in the User's Guide for more information. This function is incompatible with strings with embedded nulls. This function may be incompatible with MBCS strings.
istream& readToDelim(istream& s, char delim='\n');
Reads characters from the input stream s, replacing the previous contents of self, until an EOF or the delimiting character delim is encountered. The delimiter is removed from the input stream but is not stored. Null characters are treated the same as other characters. If delim is '\0' then this function is incompatible with strings with embedded nulls. If delim is '\0' then this function may be incompatible with MBCS strings.
istream& readToken(istream& s);
Whitespace is skipped before saving characters. Characters are then read from the input stream s, replacing previous contents of self, until trailing whitespace or an EOF is encountered. The whitespace is left on the input stream. Null characters are treated the same as other characters. Whitespace is identified by the standard C library function isspace(). This function is incompatible with MBCS strings.
RWCString& remove(size_t pos);
Removes the bytes from the byte position pos, which must be no greater than length(), to the end of string. Returns a reference to self.
RWCString& remove(size_t pos, size_t N);
Removes N bytes or to the end of string (whichever comes first) starting at the byte position pos, which must be no greater than length(). Returns a reference to self.
RWCString& replace(size_t pos, size_t N, const char* cs);
Replaces N bytes or to the end of string (whichever comes first) starting at byte position pos, which must be no greater than length(), with a copy of the null-terminated string cs. Returns a reference to self. This function is incompatible with cs strings with embedded nulls. This function may be incompatible with cs MBCS strings.
RWCString& replace(size_t pos, size_t N1,const char* cs, size_t N2);
Replaces N1 bytes or to the end of string (whichever comes first) starting at byte position pos, which must be no greater than length(), with a copy of the string cs. Exactly N2 bytes are copied, including any embedded nulls. Hence, the buffer pointed to by cs must be at least N2 bytes long. Returns a reference to self.
RWCString& replace(size_t pos, size_t N, const RWCString& str);
Replaces N bytes or to the end of string (whichever comes first) starting at byte position pos, which must be no greater than length(), with a copy of the string str. Returns a reference to self.
RWCString& replace(size_t pos, size_t N1,const RWCString& str, size_t N2);
Replaces N1 bytes or to the end of string (whichever comes first) starting at position pos, which must be no greater than length(), with a copy of the first N2 bytes, or the length of str (whichever is less), from str. Returns a reference to self.
replace(const RWCRExpr& pattern, const char* replacement, scopeType scope=one); replace(const RWCRExpr& pattern, const RWCString& replacement,scopeType scope=one);
Replaces substring matched by pattern with replacement string. pattern is the new extended regular expression. scope is one of {one, all} and controls whether all matches of pattern are replaced with replacement or just the first one match is replaced. replacement is the replacement pattern for the string. Here's an example:
RWCString s("hahahohoheehee");
s.replace(RWCRExpr("(ho)+","HAR"); // s == "hahaHARheehee"
This function is incompatible with const char* replacement strings with embedded nulls. This function may be incompatible with const char* replacement MBCS strings.
void resize(size_t n);
Changes the length of self to n bytes, adding blanks or truncating as necessary.
RWCSubString strip(stripType s = RWCString::trailing, char c = ' '); const RWCSubString strip(stripType s = RWCString::trailing, char c = ' ') const;
Returns a substring of self where the character c has been stripped off the beginning, end, or both ends of the string. The first variant can be used as an lvalue. The enum stripType can take values:
stripType | Meaning |
leading | Remove characters at beginning |
trailing | Remove characters at end |
both | Remove characters at both ends |
RWCSubString subString(const char* cs, size_t start=0, caseCompare = RWCString::exact); const RWCSubString subString(const char* cs, size_t start=0, caseCompare = RWCString::exact) const;
Returns a substring representing the first occurence of the null-terminated string pointed to by "cs". The first variant can be used as an lvalue. Case sensitivity is according to the caseCompare argument; it defaults to RWCString::exact. If caseCompare is RWCString::ignoreCase then this function is incompatible with MBCS strings. This function is incompatible with cs strings with embedded nulls. This function may be incompatible with cs MBCS strings.
void toLower();
Changes all upper-case letters in self to lower-case, using the standard C library facilities declared in <ctype.h>. This function is incompatible with MBCS strings.
void toUpper();
Changes all lower-case letters in self to upper-case, using the standard C library facilities declared in <ctype.h>. This function is incompatible with MBCS strings.
static unsigned hash(const RWCString& str);
Returns the hash value of str as returned by str.hash(RWCString::exact).
static size_t initialCapacity(size_t ic = 15);
Sets the minimum initial capacity of an RWCString, and returns the old value. The initial setting is 15 bytes. Larger values will use more memory, but result in fewer resizes when concatenating or reading strings. Smaller values will waste less memory, but result in more resizes.
static size_t maxWaste(size_t mw = 15);
Sets the maximum amount of unused space allowed in a string should it shrink, and returns the old value. The initial setting is 15 bytes. If more than mw bytes are wasted, then excess space will be reclaimed.
static size_t resizeIncrement(size_t ri = 16);
Sets the resize increment when more memory is needed to grow a string. Returns the old value. The initial setting is 16 bytes.
RWBoolean operator==(const RWCString&, const char* ); RWBoolean operator==(const char*, const RWCString&); RWBoolean operator==(const RWCString&, const RWCString&); RWBoolean operator!=(const RWCString&, const char* ); RWBoolean operator!=(const char*, const RWCString&); RWBoolean operator!=(const RWCString&, const RWCString&);
Logical equality and inequality. Case sensitivity is exact. This function is incompatible with const char* strings with embedded nulls. This function may be incompatible with const char* MBCS strings.
RWBoolean operator< (const RWCString&, const char* ); RWBoolean operator< (const char*, const RWCString&); RWBoolean operator< (const RWCString&, const RWCString&); RWBoolean operator> (const RWCString&, const char* ); RWBoolean operator> (const char*, const RWCString&); RWBoolean operator> (const RWCString&, const RWCString&); RWBoolean operator<=(const RWCString&, const char* ); RWBoolean operator<=(const char*, const RWCString&); RWBoolean operator<=(const RWCString&, const RWCString&); RWBoolean operator>=(const RWCString&, const char* ); RWBoolean operator>=(const char*, const RWCString&); RWBoolean operator>=(const RWCString&, const RWCString&);
Comparisons are done lexicographically, byte by byte. Case sensitivity is exact. Use member collate() or strxfrm() for locale sensitivity. This function is incompatible with const char* strings with embedded nulls. This function may be incompatible with const char* MBCS strings.
RWCString operator+(const RWCString&, const RWCString&); RWCString operator+(const char*, const RWCString&); RWCString operator+(const RWCString&, const char* );
Concatenation operators. This function is incompatible with const char* strings with embedded nulls. This function may be incompatible with const char* MBCS strings.
ostream& operator<<(ostream& s, const RWCString&);
Output an RWCString on ostream s.
istream& operator>>(istream& s, RWCString& str);
Calls str.readToken(s). That is, a token is read from the input stream s. This function is incompatible with MBCS strings.
RWvostream& operator<<(RWvostream&, const RWCString& str); RWFile& operator<<(RWFile&, const RWCString& str);
Saves string str to a virtual stream or RWFile, respectively.
RWvistream& operator>>(RWvistream&, RWCString& str); RWFile& operator>>(RWFile&, RWCString& str);
Restores a string into str from a virtual stream or RWFile, respectively, replacing the previous contents of str.
RWCString strXForm(const RWCString&);
Returns the result of applying ::strxfrm() to the argument string, to allow quicker collation than RWCString::collate(). Provided only on platforms that provide ::strxfrm(). This function is incompatible with strings with embedded nulls.
RWCString toLower(const RWCString& str);
Returns a version of str where all upper-case characters have been replaced with lower-case characters. Uses the standard C library function tolower(). This function is incompatible with MBCS strings.
RWCString toUpper(const RWCString& str);
Returns a version of str where all lower-case characters have been replaced with upper-case characters. Uses the standard C library function toupper(). This function is incompatible with MBCS strings.