Part of this library is header-only, while part requires to link against a library.
The library provides the following header-only components:
The following features are available by linking against the library:
This library defines the concepts of Converter and Segmenter, which are mechanisms to arbitrarily convert or segment ranges of data, expressed as pairs of iterators. The Converter and Segmenters framework allows to perform these either eaglery
Caution | |
---|---|
The organization of headers may change in the future in order to improve compile times. |
Main headers
boost/cuchar.hpp
Primitive types for UTF code units.
boost/unicode/utf.hpp
Conversion between UTF encodings.
boost/unicode/static_utf.hpp
Compile-time conversion between UTF encodings.
boost/unicode/graphemes.hpp
Functions to iterate and identify graphemes.
boost/unicode/compose.hpp
Functions to compose, decompose and normalize unicode strings.
boost/unicode/cat.hpp
Functions to concatenate normalized strings while maintaining a normalized form.
boost/unicode/search.hpp
Utility to adapt Boost.StringAlgo finders to discard matches that lie on certain boundaries.
boost/unicode/ucd/properties.hpp
Access to the properties attached with a code point in the Unicode Character Database.
As has been stated in Introduction to Unicode, several Unicode algorithms require the usage of a large database of information which, as of the preview 4 of this library, is 500 KB on x86 when stripped. Note that at the current stage of development, the database does not contain everything one might need to deal with Unicode text, so it may grow in the future.
Features that can avoid dependency on that database do so; so it is not required for UTF conversions for example, that are purely header-only.
The Unicode Character Database can be generated using a parser present in the source distribution of this library to analyze the data provided by Unicode.org.
Note however that the parser itself needs to be updated to be made aware of new proprieties values; otherwise those properties will fallback to the default value for that property and the parser will issue a warning.
The UCD is fully backward compile, and unknown property values returned by the linked library will automatically be converted to the default value for that property. This is consistent with how new values are introduced in the standard.
Future versions of this library may provide alternate implementations of this database as a thin layer over a database provided by another library or environment to prevent duplication of data. All this should be entirely binary compatible, and using one database or another should just be a drop-in replacement of a shared object.