Rogue Wave Banner

Click on the banner to return to the user guide home page.

©Copyright 1996 Rogue Wave Software

Localizing Alphabets with RWCString and RWWString

Localizing alphabets begins with allowing them to be represented. As mentioned in Chapter 2 (Eight-bit Clean), Tools.h++ code is "8-bit clean" to accommodate the extended character set. All of the English alphabet is described in 7 bits, leaving the eighth free for umlauts, cedillas, and other diacritical marks and special characters. And because even 8 bits often isn't enough to represent all the character glyphs of various languages, Tools.h++ also allows two kinds of extensions: multibyte and wide-character encodings.

Multibyte encodings use a sequence of one or more bytes to represent a single character. (Typically the ASCII characters are still one byte long.) These encodings are compact, but may be inconvenient for indexing and substring operations. Wide character encodings, in contrast, place each character in a 16- or 32-bit integral type called a wchar_t, and represent a string as an array of wchar_t. Usually it is possible to translate a string encoded in one form into the other.

Tools.h++ two efficient string types, RWCString and RWWString, were discussed in Chapter 3. RWCString represents strings of 8-bit chars, with some support for multibyte strings. RWWString represents strings of wchar_t. Both provide access to Standard C Library support for local collation conventions with the member function collate() and the global function strXForm(). In addition, the library provides conversions between wide and multibyte representations. The wide- and multibyte-character encodings used are those of the host system.

But representation of alphabets can be even more complex. For example, is a character upper case, lower case, or neither? In a sorted list, where do you put the names that begin with accented letters? What about Cyrillic names? How are wide-character strings represented on byte streams? Standards bodies and corporate labs are addressing these issues, but the results are not yet portable. For the time being, Tools.h++ strives to make best use of what they provide.


Previous file Table of Contents Next file