UTF-8: Bits, Bytes, and Benefits - http://research.swtch.com/2010...
Also: "utf-8b is a mapping from byte streams to unicode codepoint streams that provides an exceptionally clean handling of garbage (i.e., non-utf-8) bytes (i.e., bytes that are not part of a utf-8 encoding) in the input stream. They are mapped to 256 different, guaranteed undefined, unicode codepoints." From http://hyperreal.org/~est... which links to utf-8b codecs for C and Python. (A big thank you to Brian Behlendorf for continuing to host the site.) - Ruchira S. Datta
utf-8 is a good example of a successful standard. I don't really ever hear people blasting it. It solves a lot of problems, has great characteristics and just plain works. - Hayes Haugen