Wow, that really is an obnoxious standard.

The choice of data size depends on whether or not you want to support CJK Extension B. See http://en.wikipedia.org/wiki/CJK_Unified_Ideographs

You might want to check around for some C++ libraries that are already written to handle all this stuff.

ustring
http://sourceforge.net/projects/ustring/
Unicode 3.0 (2-byte entities - does not support CJK extension B)

Unicode Enabled Products
http://unicode.org/onlinedat/products.html
Has a nice selection of links to Unicode libraries

Unicode-enabling Microsoft C/C++ Source Code
http://www.i18nguy.com/unicode/c-unicode.html
MS-specific, but contains a lot of useful information anyway.
The i18nguy is a good internationalization resource.


Well, that's about all I know. Hope this helps.