UTF-8: Bits, Bytes, and Benefits - http://research.swtch.com/2010...
Mar 6, 2010
from
Louis Gray,
Chatsohbet,
imabonehead,
Amit Patel,
Bruce Lewis,
OCoG of FF, Jimminy,
Ruchira S. Datta,
Lysender,
gburd,
Benjamin Golub,
Tudor Bosman,
Vic Ted,
hailsematary,
and
AJ Batac
liked this
Also: "utf-8b is a mapping from byte streams to unicode codepoint streams that provides an exceptionally clean handling of garbage (i.e., non-utf-8) bytes (i.e., bytes that are not part of a utf-8 encoding) in the input stream. They are mapped to 256 different, guaranteed undefined, unicode codepoints." From http://hyperreal.org/~est... which links to utf-8b codecs for C and Python. (A big thank you to Brian Behlendorf for continuing to host the site.)
- Ruchira S. Datta
utf-8 is a good example of a successful standard. I don't really ever hear people blasting it. It solves a lot of problems, has great characteristics and just plain works.
- Hayes Haugen