Fast MUTF-8 encoder & decoder
Project description
mutf-8
This package contains simple pure-python as well as C encoders and decoders for the MUTF-8 character encoding. In most cases, you can also parse the even-rarer CESU-8.
These days, you'll most likely encounter MUTF-8 when working on files or
protocols related to the JVM. Strings in a Java .class
file are encoded using
MUTF-8, strings passed by the JNI, as well as strings exported by the object
serializer.
This library was extracted from Lawu, a Python library for working with JVM class files.
🎉 Installation
Install the package from PyPi:
pip install mutf8
Binary wheels are available for the following:
py3.5 | py3.6 | py3.7 | py3.8 | py3.9 | |
---|---|---|---|---|---|
OS X (x86_64) | y | y | y | y | y |
Windows (x86_64) | y | y | y | y | y |
Linux (x86_64) | y | y | y | y | y |
If binary wheels are not available, it will attempt to build the C extension from source with any C99 compiler. If it could not build, it will fall back to a pure-python version.
Usage
Encoding and decoding is simple:
from mutf8 import encode_modified_utf8, decode_modified_utf8
unicode = decode_modified_utf8(byte_like_object)
bytes = encode_modified_utf8(unicode)
This module does not register itself globally as a codec, since importing should be side-effect-free.
📈 Benchmarks
The C extension is significantly faster - often 20x to 40x faster.
MUTF-8 Decoding
Name | Min (μs) | Max (μs) | StdDev | Ops |
---|---|---|---|---|
cmutf8-decode_modified_utf8 | 0.00009 | 0.00080 | 0.00000 | 9957678.56358 |
pymutf8-decode_modified_utf8 | 0.00190 | 0.06040 | 0.00000 | 450455.96019 |
MUTF-8 Encoding
Name | Min (μs) | Max (μs) | StdDev | Ops |
---|---|---|---|---|
cmutf8-encode_modified_utf8 | 0.00008 | 0.00151 | 0.00000 | 11897361.05101 |
pymutf8-encode_modified_utf8 | 0.00180 | 0.16650 | 0.00000 | 474390.98091 |
C Extension
The C extension is optional. If a binary package is not available, or a C compiler is not present, the pure-python version will be used instead. If you want to ensure you're using the C version, import it directly:
from mutf8.cmutf8 import decode_modified_utf8
decode_modified_utf(b'\xED\xA1\x80\xED\xB0\x80')
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for mutf8-1.0.4-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2223192f9063c241afde073cd643e3e3d6bfe3d93afb71d0936afcd9e0904343 |
|
MD5 | 79cb7a60cf6a48031b78059d2cf37497 |
|
BLAKE2b-256 | 211df2e4c7800e4e8b293a16e06a65fe2c38dc9b6c6f0bd092ccb72530305494 |
Hashes for mutf8-1.0.4-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 67ad55c133addf7cbd01bb364949c8faddebaeac70ba478d7352862270dd6c18 |
|
MD5 | 389f0e7c8844adcc09faa0fa70746239 |
|
BLAKE2b-256 | c8f85643d86e4807bca9d6f40c395afcec986734d711cc0086260ed09ff23a64 |
Hashes for mutf8-1.0.4-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 32595e6d6404968fe210f1970a58da06fd4c83ed3affd34bf6cd78f0f9bb153e |
|
MD5 | a749224d61cfc60ce9760cfc3475cf7c |
|
BLAKE2b-256 | ab2b549e94d713fd8b72e715fb5582d3516966b4694b0a8df2bd7321249dd39b |
Hashes for mutf8-1.0.4-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 388510c0a371a04048cd08e30471ed32344674803147dec0686ca428415a22ff |
|
MD5 | bfe4d12c685c2ba3c256ab91e1c14760 |
|
BLAKE2b-256 | 313a4a4c7117b4064fc5dafb7a7bbb95a41f7902490bd005acda5a1944001669 |
Hashes for mutf8-1.0.4-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a97377bc42ef39dd3a996a3d7f272fc7f6b7a9740cf8c9ec29caa8b1013aa42e |
|
MD5 | 6b633a1e398afef21376ef81d5fcc8df |
|
BLAKE2b-256 | 7eb38cd7c48cd54ecbd34f65e0d2eb89d15fa50fd83b17e7d229287a79303e0e |
Hashes for mutf8-1.0.4-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 186d09ae55cbdf55d0505b7eaa586497ca28e69e8000b04e6392e5c28610d319 |
|
MD5 | 02ecf9d1fffc4aea7122ab7fbf545757 |
|
BLAKE2b-256 | 5593931099e729de2ec751c80a32c03ce0207ae6fde6f036339c8a4132b68416 |
Hashes for mutf8-1.0.4-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d609ea1e1ecb03c79e45a7dbcde7db938913456f58b6e4b79a40015455de26e |
|
MD5 | 5dfb51a552795c457a6dc0c5f21ca223 |
|
BLAKE2b-256 | fc7801ad838f79cd56fdfa185acc113d8dfb64507280b02db22f26bcd6c2d33d |
Hashes for mutf8-1.0.4-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eeb66148f9cb9d502a53aa7944b5a87affdd3207d3305f5fdcbbcf29c27930b5 |
|
MD5 | 3eb0a4379eac743e0bbb9978e6c968f2 |
|
BLAKE2b-256 | da031cbe800195b13a0fc0c69d5262189293aca9a28b64e0afbd79e57486db19 |
Hashes for mutf8-1.0.4-cp35-cp35m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 738b2bce23e0c9ff41dc6d65b5d421b9fbfb164487bd5acf479006f16ffaa7db |
|
MD5 | 681b307fb6613ded846442209a3a6b31 |
|
BLAKE2b-256 | 8132f24e0cec6b37ecf3bec6d971c9fc1ba569d1799d092d56a63f1654782239 |
Hashes for mutf8-1.0.4-cp35-cp35m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f0c2cc9e5a3b43280f4a174664fb379686de18814cdaafbff27085d0370f58ea |
|
MD5 | 8b7b088fb0010d546da6b0edf9e5f403 |
|
BLAKE2b-256 | 645c0e916ff0e782e68928b0df6729bc14c7a2565be4eaf5b0c461a5a13f6bfb |