Fast MUTF-8 encoder & decoder
Project description
mutf-8
This package contains simple pure-python as well as C encoders and decoders for the MUTF-8 character encoding. In most cases, you can also parse the even-rarer CESU-8.
These days, you'll most likely encounter MUTF-8 when working on files or
protocols related to the JVM. Strings in a Java .class
file are encoded using
MUTF-8, strings passed by the JNI, as well as strings exported by the object
serializer.
This library was extracted from Lawu, a Python library for working with JVM class files.
🎉 Installation
Install the package from PyPi:
pip install mutf8
Binary wheels are available for the following:
py3.6 | py3.7 | py3.8 | py3.9 | |
---|---|---|---|---|
OS X (x86_64) | y | y | y | y |
Windows (x86_64) | y | y | y | y |
Linux (x86_64) | y | y | y | y |
If binary wheels are not available, it will attempt to build the C extension from source with any C99 compiler. If it could not build, it will fall back to a pure-python version.
Usage
Encoding and decoding is simple:
from mutf8 import encode_modified_utf8, decode_modified_utf8
unicode = decode_modified_utf8(byte_like_object)
bytes = encode_modified_utf8(unicode)
This module does not register itself globally as a codec, since importing should be side-effect-free.
📈 Benchmarks
The C extension is significantly faster - often 20x to 40x faster.
MUTF-8 Decoding
Name | Min (μs) | Max (μs) | StdDev | Ops |
---|---|---|---|---|
cmutf8-decode_modified_utf8 | 0.00009 | 0.00080 | 0.00000 | 9957678.56358 |
pymutf8-decode_modified_utf8 | 0.00190 | 0.06040 | 0.00000 | 450455.96019 |
MUTF-8 Encoding
Name | Min (μs) | Max (μs) | StdDev | Ops |
---|---|---|---|---|
cmutf8-encode_modified_utf8 | 0.00008 | 0.00151 | 0.00000 | 11897361.05101 |
pymutf8-encode_modified_utf8 | 0.00180 | 0.16650 | 0.00000 | 474390.98091 |
C Extension
The C extension is optional. If a binary package is not available, or a C compiler is not present, the pure-python version will be used instead. If you want to ensure you're using the C version, import it directly:
from mutf8.cmutf8 import decode_modified_utf8
decode_modified_utf(b'\xED\xA1\x80\xED\xB0\x80')
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for mutf8-1.0.5-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60761d15452dbbaee28c18cb6dbd5e1a06ed65a30ee78b05c30bf61abfd8cca7 |
|
MD5 | 477b380a10931d4d8e64bd92d2e91a01 |
|
BLAKE2b-256 | 5110b83a66d71bb3a326eb9939113a57a302b1352315d40a90b40ca8625748e3 |
Hashes for mutf8-1.0.5-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8ee7141df366ab7e33acd73120edc1a168072b849a39fa2e92548151581a45ff |
|
MD5 | 51ea1be36b28a888732de44737d1622b |
|
BLAKE2b-256 | f2733ee25517592a31766d0102372ed193a1d3d1836c8af4c46f5c6b97c849cc |
Hashes for mutf8-1.0.5-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5758dd00dbdc243ba6c7a4ee3b0061af119bf87abec65009d2d121d231bbafb5 |
|
MD5 | 350ea57d723b29330d6e5fdbd149a01d |
|
BLAKE2b-256 | db9956eeafaf173cbad2152ce3d3a6b6b525996de47fb547b255d7987babd5f3 |
Hashes for mutf8-1.0.5-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5754d5796cef07e8e00f7bb24ce997b7d9cd70f4e75c6b9b9eaedfdf411cacf |
|
MD5 | 79122ed42e9b4589b286dc63b920a3e6 |
|
BLAKE2b-256 | da0ce49c3a363e5429a2dc02efc83c663cdf09a0606edec4526d4f647b13eb14 |
Hashes for mutf8-1.0.5-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d08d8223511b81d2509ab4a74a5c60dddf6f4c83232468b3329ee674c6902a1 |
|
MD5 | 194c3b22f24bf77c66c3cb5c5ddb9567 |
|
BLAKE2b-256 | 2f84acf521d507812a2154a73224dd4e0805f4d694fb4880787651cd305b5644 |
Hashes for mutf8-1.0.5-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b6c019d4e6bb17e17d8627b87190e95a7d2b4ed74acb87db694599f43beb00f |
|
MD5 | 363c0b64ce1341e2562d1efab0952eb3 |
|
BLAKE2b-256 | 80e9acf618d311ce234a8371a091d2ff4321d44e8147915048f8aafaf3d4a3b6 |
Hashes for mutf8-1.0.5-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07db6fb63359e404fbdb1ff4f25d4173aa4ec56b874895ecc8eaa7c3ba29b729 |
|
MD5 | 8d36d1467dad68cf874453db02bae57f |
|
BLAKE2b-256 | 8639173c40057f3841eb139b6bb17c1b20b26177efcf7fe2e6b3027d57bd5b73 |
Hashes for mutf8-1.0.5-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f5c8f5aa706b834784d56f0f1f73fd9b85d28eb2073aa73285cdb6f03ec43dab |
|
MD5 | 541c2afdb094be324f37bf1111713c4b |
|
BLAKE2b-256 | 5d4100ec3817d9ca3cf58fba44d991134c298370ca76afb98c8b8afd352519fc |
Hashes for mutf8-1.0.5-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb1de72b610b64e3de8adf0d520e65fa32670cdcda9a8327aecdba64383cd93c |
|
MD5 | 8093f6b7dbb7fecc5984923c72ad0290 |
|
BLAKE2b-256 | 8b393f7431ac22e6577ffa6ae76eecddf1da513f4316c77490737419ab15e413 |
Hashes for mutf8-1.0.5-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f9c736cfa0ad9e77509aea771e112fa0e58022ec9b5c55abcb38d53715dc910 |
|
MD5 | 7204160b2c32c989dead0901b5b2ca3a |
|
BLAKE2b-256 | 5dff96db075c322ba2706d8155ae7b78873733a4afd3ea4b26a972798ac0af40 |
Hashes for mutf8-1.0.5-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54edb1f8de7d9c2db4590ad3803aa5cb41f78d3d1521649d681085be96d23438 |
|
MD5 | a214d1f5074e29d714d7495f7f7534c5 |
|
BLAKE2b-256 | 15503a16649364c6ded3ad2d7d62f78db2876c1dc9c3e27cf14b0558ac5b8b5b |
Hashes for mutf8-1.0.5-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dbfe8aa117976ce08d7d929e344af6e31a93ada4fcfe40c3c67e136fc625ef5d |
|
MD5 | 50aa5e3cf62de840c1626036353e274b |
|
BLAKE2b-256 | a6e49f182eb3b63c65b797230e8cdd7becf0221a87f62557c1692c7e7bbb2444 |