Encode and decode text using UTF-9.
On April 1st 2005, IEEE released the RFC4042 “UTF-9 and UTF-18 Efficient Transformation Formats of Unicode” :
The current representation formats for Unicode (UTF-7, UTF-8, UTF-16) are not storage and computation efficient on platforms that utilize the 9 bit nonet as a natural storage unit instead of the 8 bit octet.
Since there are not so many architecture that use 9 bit nonets as natural storage units and the release date was on April Fools’ Day, the beautiful UTF-9 was forgotten and no python implementation is available.
This python module is here to fill this gap! ;)
There are only two functions:
- utf9encode(string): takes a string and returns a utf9-encoded version.
- utf9decode(data): takes utf9-encoded data and returns the corresponding string.
>>> import utf9 >>> encoded = utf9.utf9encode(u'ႹЄLᒪo, 🌍ǃ') >>> print repr(encoded) 'p\xe0\xb7-\x0c!1\xc3\x92\xd5\x1b\xc5\x82\x07n\x83x\xed\xdecX\xf80' >>> print utf9.utf9decode(encoded) ႹЄLᒪo, 🌍ǃ