Fast MUTF-8 encoder & decoder
Project description
mutf-8
This package contains simple pure-python as well as C encoders and decoders for the MUTF-8 character encoding. In most cases, you can also parse the even-rarer CESU-8.
These days, you'll most likely encounter MUTF-8 when working on files or
protocols related to the JVM. Strings in a Java .class
file are encoded using
MUTF-8, strings passed by the JNI, as well as strings exported by the object
serializer.
This library was extracted from Lawu, a Python library for working with JVM class files.
🎉 Installation
Install the package from PyPi:
pip install mutf8
Binary wheels are available for the following:
py3.5 | py3.6 | py3.7 | py3.8 | py3.9 | |
---|---|---|---|---|---|
OS X (x86_64) | y | y | y | y | y |
Windows (x86_64) | y | y | y | y | y |
Linux (x86_64) | y | y | y | y | y |
If binary wheels are not available, it will attempt to build the C extension from source with any C99 compiler. If it could not build, it will fall back to a pure-python version.
Usage
Encoding and decoding is simple:
from mutf8 import encode_modified_utf8, decode_modified_utf8
unicode = decode_modified_utf8(byte_like_object)
bytes = encode_modified_utf8(unicode)
This module does not register itself globally as a codec, since importing should be side-effect-free.
📈 Benchmarks
The C extension is significantly faster - often 20x to 40x faster.
MUTF-8 Decoding
Name | Min (μs) | Max (μs) | StdDev | Ops |
---|---|---|---|---|
cmutf8-decode_modified_utf8 | 0.00009 | 0.00080 | 0.00000 | 9957678.56358 |
pymutf8-decode_modified_utf8 | 0.00190 | 0.06040 | 0.00000 | 450455.96019 |
MUTF-8 Encoding
Name | Min (μs) | Max (μs) | StdDev | Ops |
---|---|---|---|---|
cmutf8-encode_modified_utf8 | 0.00008 | 0.00151 | 0.00000 | 11897361.05101 |
pymutf8-encode_modified_utf8 | 0.00180 | 0.16650 | 0.00000 | 474390.98091 |
C Extension
The C extension is optional. If a binary package is not available, or a C compiler is not present, the pure-python version will be used instead. If you want to ensure you're using the C version, import it directly:
from mutf8.cmutf8 import decode_modified_utf8
decode_modified_utf(b'\xED\xA1\x80\xED\xB0\x80')
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for mutf8-1.0.3-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66f546b4f9e1385ce245215de57ee619134c5e7de2cf60978c63594a35e7139f |
|
MD5 | e3c7ca34a406046c97c5b3324a55dc22 |
|
BLAKE2b-256 | e4fb436206ad24018ff9c9c21b8223d945109665744d9efb435bd78b87bf72d3 |
Hashes for mutf8-1.0.3-cp39-cp39-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e9422f7f5c737aef592a34af20b84a4fc4a2283e70737fc73604616161033528 |
|
MD5 | dbf10e4fd0cea66a9bb49f5b220b66b4 |
|
BLAKE2b-256 | 5b94b7685363fa97ce767d0ccd016607a09f8bde0a3b2e4eee1ff552a5b8d011 |
Hashes for mutf8-1.0.3-cp39-cp39-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6f93a605b8a8614e65791cb4a652ecf06ff6cc8791c6a640ad77bcacbba06d7 |
|
MD5 | fca6505073c9d8eb4694bb206ea1dfd8 |
|
BLAKE2b-256 | 9b43ac457283fea1f9b7159f6874e19a7c93d83579f65d1aa548e0baabc4e98d |
Hashes for mutf8-1.0.3-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 609340e0de1faf61df2bf24e56b7d9b2b3cffff01c1e88e3d28f8c5610220d8a |
|
MD5 | 7fb19394bb179ed33431d2fa4f901b3d |
|
BLAKE2b-256 | a862bcc4320ba13e7ef7124bb2c37ea51067bb81912f20ccc0363cf4f953282d |
Hashes for mutf8-1.0.3-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 34435e5e1b1abe2287b3f70798d180f36da938abe55d3106bb3a51ea2aa24e91 |
|
MD5 | 7b44f4a1650fd5e70ff938b783ec763b |
|
BLAKE2b-256 | a0128fb12f3515422537474de2a5c827f349ce7d8da36a1177cd09f3f868dd89 |
Hashes for mutf8-1.0.3-cp38-cp38-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 007d9857714e53fe36d07f39eaa945306339a12e968011b74517dcf0d7ff25b9 |
|
MD5 | db3c0a908e0976cf3df7e51b3c66a43b |
|
BLAKE2b-256 | c93a65faa5cfc40fde88959ec4b64e6b0a028da067e013e31690306e361cdc39 |
Hashes for mutf8-1.0.3-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2e055d131829bfbe5412bb0b14bdbdb7d1d50de62a203cb6398c518f27f750d5 |
|
MD5 | 09e4afb7b3f7e3e9be968b7060bdde47 |
|
BLAKE2b-256 | e90df768b1e60fad7b21d9c688e209dc8505fe52b7b5d23d47283113b42e39b1 |
Hashes for mutf8-1.0.3-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c173d83f4109520cf234231a1ee33c9ab302b59178086b75c074941c8f34a2dc |
|
MD5 | b541b3c4e2984f6679d92854ba9f8488 |
|
BLAKE2b-256 | c186ae570c30d9a35197172485cf40d2c51776d3338df73f08bdea846fbd0d1f |
Hashes for mutf8-1.0.3-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 16b3a9246efbda14867973eb1a67373f75e4e7635c01696b2cdfc5b8fd4ba04f |
|
MD5 | 86d2109febe508a01970b6adcddd89a2 |
|
BLAKE2b-256 | d4bab2ac974631b92dbc12ef2f7beb70564dafca1a91777eb24ea8ee5eb6ee7e |
Hashes for mutf8-1.0.3-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d4eb450fe4c47412b690642df4e05fb0cf5066e16a649aea2f889bc7a7c9fc30 |
|
MD5 | 067ebbaad1926b1ee80d4370ab0c5f29 |
|
BLAKE2b-256 | 37f9523cfb73ffec26445ce7dd17e148671d383edfde92bc1af03385be253514 |
Hashes for mutf8-1.0.3-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 97b59ed5331b14536f5f8da83646224835542d581d47f2d518539b43106d27d4 |
|
MD5 | dc5fa7fdfa475fbfdd5c7973576b7b14 |
|
BLAKE2b-256 | 82043f2b1174d1f8a2c40938ff38927427f67f07110a2251d31c2b1d98971ee8 |
Hashes for mutf8-1.0.3-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c12ea0343b917012225d89941fa2ac236d3f42e5cc43b1b547a4143d6234394 |
|
MD5 | b6fc5dc319b7ef99272f44eee56bd280 |
|
BLAKE2b-256 | 43e7376f6ca9f5d54471b5315e95140305e375a8e609bdecde98b77917a008c8 |
Hashes for mutf8-1.0.3-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f9489f1562ade67e1bf9042f3eb77bf1c1e2dcbf728612d0f9d327f8b8840e65 |
|
MD5 | 92755787bd829dd77b6d738f7059179d |
|
BLAKE2b-256 | 649f6a78374e565928fc5e7ecc842392cde4ceb1546b820014166af7cdb17270 |
Hashes for mutf8-1.0.3-cp36-cp36m-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f3d53671552e374dafa9e7bab7d8be034b1166b111f0a8c3ffb8a6c379d2bd62 |
|
MD5 | 5a7d1797fcd42822e23ec0802ab278db |
|
BLAKE2b-256 | f27556bfa29782b6605e48ec8b96480f6654e9b34ffc1db9a0a570df535314aa |
Hashes for mutf8-1.0.3-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a2e8d49e98e31662702e4b6b3e8aeaddeafb7848a50e3289444dc0325cc2812 |
|
MD5 | ded74bf02cfbbe3ab06b0adf4ca325cf |
|
BLAKE2b-256 | 1593f332bd1a813ab7c81f8d77c2c0e3b39469b191914584e68c17dff66a9f30 |
Hashes for mutf8-1.0.3-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9995950dfd8e5cbc8261c050db366830f1c85dabd32ec6731dcf51fe78a8cbc0 |
|
MD5 | f2093fc6f7ee8ea8c9d28687923c022d |
|
BLAKE2b-256 | 98e9b402c54ba1e0c58a768cfc475aec7f3428f5099c7fd7b231a11b718313dd |
Hashes for mutf8-1.0.3-cp35-cp35m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c71ad02baaf9e516c2c4ceb2435355fff1d926a9c12e4bd060a482a667de141 |
|
MD5 | 5ac45a016ff7f1e654db1a01c8303f4a |
|
BLAKE2b-256 | 5d812d3ca8be318bd3348407d9f05253cb80a39570f232e86ea4b0fc123a4599 |
Hashes for mutf8-1.0.3-cp35-cp35m-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d9298ae58b8edc13cdd1b0289ce9ba3c5256f67c7f32f452ed62f81e776da22 |
|
MD5 | 58e8201fd35f4010315ad5885492dd91 |
|
BLAKE2b-256 | 7768809dcb4ad592a37cb6aa2da6c127af011c9e47e28e3134422e3076e68ac6 |
Hashes for mutf8-1.0.3-cp35-cp35m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 73242313cccbb5b6413d50aacf0af84e73e1f027f2765692ee76cac2cf847c8b |
|
MD5 | 6304b4c2dca1ed2dce1a6cf4b6a235ae |
|
BLAKE2b-256 | e6ed83604cdbf11d67c40966b1618abe722c995b7aad328bb7be19df2e81d086 |
Hashes for mutf8-1.0.3-cp35-cp35m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb61e0c0bf7784b372130814926bbcbb0c9e92c55218820f37d93d9b1f4e0c48 |
|
MD5 | 07984ac225590743cb540962508c4d4e |
|
BLAKE2b-256 | b5769d7104bf88cce5cc46e8514efe48597a5f4446f045635d5709c099bfcabf |