Fast MUTF-8 encoder & decoder
Project description
mutf-8
This package contains simple pure-python as well as C encoders and decoders for the MUTF-8 character encoding. In most cases, you can also parse the even-rarer CESU-8.
These days, you'll most likely encounter MUTF-8 when working on files or
protocols related to the JVM. Strings in a Java .class
file are encoded using
MUTF-8, strings passed by the JNI, as well as strings exported by the object
serializer.
This library was extracted from Lawu, a Python library for working with JVM class files.
🎉 Installation
Install the package from PyPi:
pip install mutf8
Binary wheels are available for the following:
py3.6 | py3.7 | py3.8 | py3.9 | |
---|---|---|---|---|
OS X (x86_64) | y | y | y | y |
Windows (x86_64) | y | y | y | y |
Linux (x86_64) | y | y | y | y |
If binary wheels are not available, it will attempt to build the C extension from source with any C99 compiler. If it could not build, it will fall back to a pure-python version.
Usage
Encoding and decoding is simple:
from mutf8 import encode_modified_utf8, decode_modified_utf8
unicode = decode_modified_utf8(byte_like_object)
bytes = encode_modified_utf8(unicode)
This module does not register itself globally as a codec, since importing should be side-effect-free.
📈 Benchmarks
The C extension is significantly faster - often 20x to 40x faster.
MUTF-8 Decoding
Name | Min (μs) | Max (μs) | StdDev | Ops |
---|---|---|---|---|
cmutf8-decode_modified_utf8 | 0.00009 | 0.00080 | 0.00000 | 9957678.56358 |
pymutf8-decode_modified_utf8 | 0.00190 | 0.06040 | 0.00000 | 450455.96019 |
MUTF-8 Encoding
Name | Min (μs) | Max (μs) | StdDev | Ops |
---|---|---|---|---|
cmutf8-encode_modified_utf8 | 0.00008 | 0.00151 | 0.00000 | 11897361.05101 |
pymutf8-encode_modified_utf8 | 0.00180 | 0.16650 | 0.00000 | 474390.98091 |
C Extension
The C extension is optional. If a binary package is not available, or a C compiler is not present, the pure-python version will be used instead. If you want to ensure you're using the C version, import it directly:
from mutf8.cmutf8 import decode_modified_utf8
decode_modified_utf(b'\xED\xA1\x80\xED\xB0\x80')
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for mutf8-1.0.6-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4f7a24b55c53d508a7ecb2e8c6fe14e4fcefaa4c48100b446e73217ade7875a0 |
|
MD5 | aef180ef35a7a3b9a4321028fc322dc5 |
|
BLAKE2b-256 | d68ca5186e0116f2107856ea71babb5e9997cd5d717b952cf02a5cf1647aff2a |
Hashes for mutf8-1.0.6-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6172b5babc0c819636830fc79ca9c3a82662ef1ee764c82c1b59fbf6ea54d82f |
|
MD5 | 1fd0c185a86833f27c0beba8c2e5f416 |
|
BLAKE2b-256 | 574a1ad8954084a75e308d978bb0ef95b61d29c84f8b4a4fbc0a687b62922789 |
Hashes for mutf8-1.0.6-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3207a071ead14d928213019f12b5554b179f61a16a8094ed660b755990db3652 |
|
MD5 | f61c30756ca7e4fd3f19e6215fe16161 |
|
BLAKE2b-256 | dabc9e05f5b1d3156822bcdd8b07319f41d05f8ee7237643fd470255af95d6e8 |
Hashes for mutf8-1.0.6-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d1325d42806b31901a0ddd4ef199144e508fd9f6f3c75a8305d5979365b66c3 |
|
MD5 | 24299594fa9def9f16652a4036895a4f |
|
BLAKE2b-256 | 28c3f3f7b0f9000ebdbad8440941a7926b02c28231e434fb0fd7c80aad2b940c |
Hashes for mutf8-1.0.6-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7a67e88534a7641c513dad13f2f7913239808df4a5d0b822eda0ff9024431e0b |
|
MD5 | e2642fd10c76114c6dbe287bb51dce94 |
|
BLAKE2b-256 | 4817c2b687871abff8e15ceb689e2c01ec3fe73a9461d428561ffd17278c2802 |
Hashes for mutf8-1.0.6-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 018ceda7cdb66a1d3e9c07a71a1a35b92570fbb1230887a34ad784ff4d349981 |
|
MD5 | 88417a3a9a2030f273994178d51370c7 |
|
BLAKE2b-256 | ca084610bad7f9af6f82f62b162d24ea4139d2ef8a173e760a87d776aa57b938 |
Hashes for mutf8-1.0.6-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1925f5490fabca5c34138ed6644a1a093b0d935252207a5e89664097ff14114c |
|
MD5 | bc66f790756a5a00604682e15cada4a8 |
|
BLAKE2b-256 | b69c577a93c09a3f16e718e6783d7b72c0fe08cd944637ba14ac72c4812eb26f |
Hashes for mutf8-1.0.6-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f4f497f20e3ea7968496c1eb1e1cb259c53ad040879e1e83ffb755a12112a04 |
|
MD5 | 1682a9d2101df4b3def640c0d5f5d2cf |
|
BLAKE2b-256 | 37b951ac052f1d9ce1eca596a64a4b71ac32d05483d636c03e335be555ad6725 |
Hashes for mutf8-1.0.6-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e09f4a19e5500699bb42074890b463b785ab9a8d95c7d793e590405f3b4b29d7 |
|
MD5 | f310510d0212664f2359cb797b958d86 |
|
BLAKE2b-256 | 197aca090f94dc1848aeeafb02e739edb78092ea027afe30119eb97df2c8e95d |
Hashes for mutf8-1.0.6-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 83c38555db263e369e95533d80848d8e4296e302303b72082b98c3124cba504d |
|
MD5 | fa5ddbfdf58334918df6eb5ceb6160c1 |
|
BLAKE2b-256 | 2347615e86d4d318839c8b75e7ded85a5cd440425156f7426b7435ff5288f15d |
Hashes for mutf8-1.0.6-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fcf20045263ce8ebd6c47e94c9477ab0d388ed169a69ad2d8f19bcbf0b87f401 |
|
MD5 | acfc25dac566d7324254ad2a71944ee7 |
|
BLAKE2b-256 | 1f4fa0fecea0020c194378c2ab4e8d26acfbad9c177c1947e62adb63f1b02de4 |
Hashes for mutf8-1.0.6-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 74ae69cd9790fa4f0f6a7b0db503c459c955b8235551baf683cb4f3f31677063 |
|
MD5 | 341b28ca1b5c041e5be438bf300fbc5c |
|
BLAKE2b-256 | 1d35a974f7150411b1597e49bbfa2361afa0a69b776b02e4514c2b8fb663178c |