Fast MUTF-8 encoder & decoder
Project description
mutf-8
This package contains simple pure-python as well as C encoders and decoders for the MUTF-8 character encoding. In most cases, you can also parse the even-rarer CESU-8.
These days, you'll most likely encounter MUTF-8 when working on files or
protocols related to the JVM. Strings in a Java .class
file are encoded using
MUTF-8, strings passed by the JNI, as well as strings exported by the object
serializer.
This library was extracted from Lawu, a Python library for working with JVM class files.
Usage
Install the package from PyPi:
pip install mutf8
Encoding and decoding is simple:
from mutf8 import encode_modified_utf8, decode_modified_utf8
unicode = decode_modified_utf8(byte_like_object)
bytes = encode_modified_utf8(unicode)
This module does not register itself globally as a codec, since importing should be side-effect-free.
📈 Benchmarks
The C extension is significantly faster - often 20x to 40x faster.
MUTF-8 Decoding
Name | Min (μs) | Max (μs) | StdDev | Ops |
---|---|---|---|---|
cmutf8-decode_modified_utf8 | 0.00009 | 0.00080 | 0.00000 | 9957678.56358 |
pymutf8-decode_modified_utf8 | 0.00190 | 0.06040 | 0.00000 | 450455.96019 |
MUTF-8 Encoding
Name | Min (μs) | Max (μs) | StdDev | Ops |
---|---|---|---|---|
cmutf8-encode_modified_utf8 | 0.00008 | 0.00151 | 0.00000 | 11897361.05101 |
pymutf8-encode_modified_utf8 | 0.00180 | 0.16650 | 0.00000 | 474390.98091 |
C Extension
The C extension is optional. If a binary package is not available, or a C compiler is not present, the pure-python version will be used instead. If you want to ensure you're using the C version, import it directly:
from mutf8.cmutf8 import decode_modified_utf8
decode_modified_utf(b'\xED\xA1\x80\xED\xB0\x80')
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for mutf8-1.0.1-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7a638815e407cd7bc74dda5daff058ecbf9b40315d3e67f315d462c8317c2f02 |
|
MD5 | 99b27bde8f518196c3d4a84eb5b55029 |
|
BLAKE2b-256 | f5b126c76416477975fb1e35347672e1302c39cd94a50a15bb76d76ed98ad2da |
Hashes for mutf8-1.0.1-cp38-cp38-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1b8ea346a92756874deeae956f6c527df4911c6776db48bf2972863a2f67c8d5 |
|
MD5 | 38a8b8d7ac5a1e49d79ba0ac1a7c3811 |
|
BLAKE2b-256 | 44f3df75cb1c1cd24b2229013c751f49645a5b93d6d10eb2ec1ce72c8f25fb56 |
Hashes for mutf8-1.0.1-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 15d92176414e2165544a49212b7894067c74df53ba3a4b8e10210b1af2f4cbeb |
|
MD5 | 1a1139dfb10728b6c28cdce2bd208d90 |
|
BLAKE2b-256 | 90e19129b797168d1761fd6987ba67a8e3693869a413009af9e2d23489b9d737 |
Hashes for mutf8-1.0.1-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 276bd571fc9e39dcb979fd4fdd364c03e8595467e4d4ec545280d0b44c7c1ca0 |
|
MD5 | 93a535214ba9638fa01c55c3cc08d033 |
|
BLAKE2b-256 | 98c7ab9fcb5e5e94b41cdfe1ff3d183343f4b9b0e28cee2d8aa95efba58fb671 |
Hashes for mutf8-1.0.1-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | afedcf09ef414a2ca433f697d6c804637330e171224ec93645acfb6b6249cc2c |
|
MD5 | a46ecd6d3d36b17d1af914b0593ef13f |
|
BLAKE2b-256 | ecaa879f60dc0548ee091826ab7125bcffd462f3189367eab3a3497b21eddcdb |
Hashes for mutf8-1.0.1-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 78fcc89d8d9d0596e3795793d275d4d3f477990f4c599af1646240733591fe3c |
|
MD5 | 459d7008e2f4bb1cf1dd67f07734d587 |
|
BLAKE2b-256 | 7c12f280a382ef2e4de0642cb663f7c41e2f24e9e0240a490418322c8650b677 |
Hashes for mutf8-1.0.1-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aad1dc23770f5d637c70819cf1d7ca15245e24daca89c045ba935efa12836aad |
|
MD5 | 7c8bc5414962b27bf073b37af4f29cf3 |
|
BLAKE2b-256 | e5a8b549e98f2f576517460b05d42d41c972d56de1d297934b1660403dc0b9f9 |
Hashes for mutf8-1.0.1-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd25d6780e34e5542dd4d12c362913cb51edb14a66dbeea743d5477904847cb9 |
|
MD5 | e46a4e960cd1d1ebf8e7dc71c971e31f |
|
BLAKE2b-256 | 2303b6dd481568c2b6f29b962b288f1fd699cdb20ed8790f27fab814150c6a21 |
Hashes for mutf8-1.0.1-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8df4765ecd1ed00bce9fe7d9b476b73582181446e50fe5c1ac2e03eb55e8c181 |
|
MD5 | 95e0ed9144c284711630497b7910d744 |
|
BLAKE2b-256 | c0e77c713629d058c6ae93e846e0f87620407c158ff5f86642692d8e7e47ba0e |
Hashes for mutf8-1.0.1-cp36-cp36m-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 42192337860c45c28385898d0d64d23b2fda610ee8c574af0fc141ad4c582462 |
|
MD5 | 4c63385f62ca3a0d14ae543b9a57a18b |
|
BLAKE2b-256 | 96f278aa310b201caf8dd249f179ae44203a6bea2d61d0b427c65f6c2770c252 |
Hashes for mutf8-1.0.1-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d881f7b4ae9e88816f1a42fb23376b607c2d14ae0ce5415dd82ffaf564c15891 |
|
MD5 | c3e21050936107666d263598fcd43661 |
|
BLAKE2b-256 | c6ea3fbce1fcd597a7f1bf4735d9d4f9c30ddfe4d59de3641a287a0340194cb2 |
Hashes for mutf8-1.0.1-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ec2d22c9a5999a80710719a091041d077a6d1dae6cca663199dc27ecf9bc66e |
|
MD5 | bcd7d0d2ae685a0dcd1cdd93aae387ee |
|
BLAKE2b-256 | 63910d6754234d6e60830e825f93ae4f6ade79dfa9c290cc00afa41e71889c85 |
Hashes for mutf8-1.0.1-cp35-cp35m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2ab350f0ad06d3d3455d7942d16df9b3049290defdea26d235e3118d74846ad4 |
|
MD5 | c04dc42e36c6ab84f64567c092227058 |
|
BLAKE2b-256 | 1930cf624ca941f1b1046f7f04d65d8ebbbbc32693dde5b89e4287fa9664f1ee |
Hashes for mutf8-1.0.1-cp35-cp35m-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | febcf9bc0d41d1f6c67b0185aacb7142a6d8236d63524a660375c484f5e50caa |
|
MD5 | 4c1332941021fce62f63d4b31057fd1c |
|
BLAKE2b-256 | 8ab6c06f1871c7943903db28300f96b8f03ad32d86be5b1f65f3a7d12b24da29 |
Hashes for mutf8-1.0.1-cp35-cp35m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9b6d453b949581b9c28ed7279c2f02baa82dd33f2207c516717d1df85cc038c1 |
|
MD5 | f1e244ef80690aafc32b4b7d011b6d94 |
|
BLAKE2b-256 | 3fb572a27745d1f7b45333840a03c55beb045d0784d3ceb75b32811353d84547 |
Hashes for mutf8-1.0.1-cp35-cp35m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9815599db359fbca4eb7d04efa41aaaeaa42f34ee5375886953beda10788fe90 |
|
MD5 | 63684c84411ea3280edd52dd8569c360 |
|
BLAKE2b-256 | a7d3a50d5ccc566e974ec75d4257598f4947bc5066edfaff1534b5d9c744f42d |