Fast MUTF-8 encoder & decoder
Project description
mutf-8
This package contains simple pure-python as well as C encoders and decoders for the MUTF-8 character encoding. In most cases, you can also parse the even-rarer CESU-8.
These days, you'll most likely encounter MUTF-8 when working on files or
protocols related to the JVM. Strings in a Java .class
file are encoded using
MUTF-8, strings passed by the JNI, as well as strings exported by the object
serializer.
This library was extracted from Lawu, a Python library for working with JVM class files.
🎉 Installation
Install the package from PyPi:
pip install mutf8
Binary wheels are available for the following:
py3.5 | py3.6 | py3.7 | py3.8 | |
---|---|---|---|---|
OS X (x86_64) | y | y | y | y |
Windows (x86_64) | y | y | y | y |
Linux (x86_64) | y | y | y | y |
If binary wheels are not available, it will attempt to build the C extension from source with any C99 compiler. If it could not build, it will fall back to a pure-python version.
Usage
Encoding and decoding is simple:
from mutf8 import encode_modified_utf8, decode_modified_utf8
unicode = decode_modified_utf8(byte_like_object)
bytes = encode_modified_utf8(unicode)
This module does not register itself globally as a codec, since importing should be side-effect-free.
📈 Benchmarks
The C extension is significantly faster - often 20x to 40x faster.
MUTF-8 Decoding
Name | Min (μs) | Max (μs) | StdDev | Ops |
---|---|---|---|---|
cmutf8-decode_modified_utf8 | 0.00009 | 0.00080 | 0.00000 | 9957678.56358 |
pymutf8-decode_modified_utf8 | 0.00190 | 0.06040 | 0.00000 | 450455.96019 |
MUTF-8 Encoding
Name | Min (μs) | Max (μs) | StdDev | Ops |
---|---|---|---|---|
cmutf8-encode_modified_utf8 | 0.00008 | 0.00151 | 0.00000 | 11897361.05101 |
pymutf8-encode_modified_utf8 | 0.00180 | 0.16650 | 0.00000 | 474390.98091 |
C Extension
The C extension is optional. If a binary package is not available, or a C compiler is not present, the pure-python version will be used instead. If you want to ensure you're using the C version, import it directly:
from mutf8.cmutf8 import decode_modified_utf8
decode_modified_utf(b'\xED\xA1\x80\xED\xB0\x80')
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for mutf8-1.0.2-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb052289032e76ede6a9e0a169fdd23f297dbfc385dc068757d1853456975ff8 |
|
MD5 | 37379838bcb4a4f40ea8c36c044d4758 |
|
BLAKE2b-256 | f624ee0da7da859903f35eb8183ca3c9d649f10c493692a42efed71e4b430c77 |
Hashes for mutf8-1.0.2-cp38-cp38-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a084cb19409f74b6580dfaf09f2d92b6284dedb4490bd458e67f3cb48d7cf180 |
|
MD5 | e53ba5e001f7579ebadd9f74e7a19371 |
|
BLAKE2b-256 | 1602c79425b95392abaac13961bb1407b693853789f358d48decaaa841a85865 |
Hashes for mutf8-1.0.2-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 716d9960808180c8d15b74bcb49d9808e2829372c7ef4fb0131391294bece1e6 |
|
MD5 | c0d9f76013874eaad0c0d2edfa56c840 |
|
BLAKE2b-256 | 1963a40ed09f8e40dfc12affec8da798e99c16f8a91b09e4b71062236dc20cce |
Hashes for mutf8-1.0.2-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5277b7c0c7fb66fca5ad265534d5b4f4f7d3e27379240c1b0dde24b3537dc561 |
|
MD5 | cfee7fa99d1fec68cc54b53c4b89b213 |
|
BLAKE2b-256 | 2ccd31bd9391448fb6f260ccf69f40076011ac621910214af43a8aeff1c5a59f |
Hashes for mutf8-1.0.2-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ce7bae6ba458a72aa5327025e8c75883316f22ddc6f9296bc4941c3f83c57f6 |
|
MD5 | 9991c8e32ebcf55a5472176b54aaeff1 |
|
BLAKE2b-256 | d430069838a24309b4786a1fbd713b1c87e3de018565cf7614b8de289ad23a91 |
Hashes for mutf8-1.0.2-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0620e123b0ae85ad3d981c6c426a8798787c44a71bf3abe0c4fcff94ed3f9f1e |
|
MD5 | f0378a43bc13b6fb8f2dbd346bd0c654 |
|
BLAKE2b-256 | 74d5b33df61e2706c0f292d8a76ec97cae0e400572aa401e4f1aeadc246b6b33 |
Hashes for mutf8-1.0.2-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf2340f07f0a8799c170622920ce93a2275c45b4a243561ce5a312c471684ba0 |
|
MD5 | a2970137c5212912b6aca6cffe9f1d80 |
|
BLAKE2b-256 | f07ca660d3346ef5e8b8d46b41791f091ccd7e031271dfd58cb6a7385aa9a722 |
Hashes for mutf8-1.0.2-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | da491452ef83a871e00b3915d42c31e8d09ea5f9c84dd0b624363e63278e1aff |
|
MD5 | 8315afe76295a8d18672cdfc9a657e6e |
|
BLAKE2b-256 | 24dc8af361d410645e7169252603b7456d270444cbdd6a3fa968091db99367f1 |
Hashes for mutf8-1.0.2-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ead61d27b6312cd176b7d69bda3c244d242fb2d387c5525803fe997e1a17edbb |
|
MD5 | 48b30cd2c9064e75245d1beaf59dec0d |
|
BLAKE2b-256 | ef5408bec976eb466c8d04b4dbed53cf963faac397c12d1bbdc94aeb4381c791 |
Hashes for mutf8-1.0.2-cp36-cp36m-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 047320d9871c99a1d93f0943a37b6d9dcb9c6c594040891f9c727c08336a399a |
|
MD5 | bf0ae4b1244a4eaaf584f3fd8bc7c2c8 |
|
BLAKE2b-256 | 2d88048c28dcef2980e723fedc612d9f4501f041d9a612b93d7cb14352a354ac |
Hashes for mutf8-1.0.2-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c78607323ac9a4b92ee74b24b635677b5a251f87427cbc436fb32a9a14180d0c |
|
MD5 | b90e9672c6315616ec8720820ca5a693 |
|
BLAKE2b-256 | 46e28714ed905af382b3ca1b9bce39a501c0ccfc5b543cd52f6303224d5c6712 |
Hashes for mutf8-1.0.2-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f1fcfda3b6aa9df0fa86a85a022047f6130d82fe26bb9dba1024a4f8b6e71d3f |
|
MD5 | a6ef34c4926387c2fed5c28c5e68ba4c |
|
BLAKE2b-256 | 98f2a6936f0c704ebb70b7ce9bc10dab81e5acac1693e512d7ca44d473c2d84a |
Hashes for mutf8-1.0.2-cp35-cp35m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f6ea437a2f7333ecc6a4cf9c92f77d292d11abd20a4bbf450630be400d2e88b5 |
|
MD5 | ad3802774beebd0dda64dd33487fe7a8 |
|
BLAKE2b-256 | a35d7c4fa0a12f2aa24fa1db4c9ee21bdcc7f8257f2024c854dbb6ad2ba9b63a |
Hashes for mutf8-1.0.2-cp35-cp35m-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d73e5250bc36167f41e318e339d4fd4c230f335a97112f5097061520ab8ec8f |
|
MD5 | f8a3bf1c42bd1bafc74b025d2882dc3c |
|
BLAKE2b-256 | d89d624892b725087ac2c88ac86a30839518f06c829c83e8ddc450347de58871 |
Hashes for mutf8-1.0.2-cp35-cp35m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | abb51a5a50bae9b0aa0e3d4e774da9eb03b3a3b37155e66c42962df94f2e6ce4 |
|
MD5 | b49c9b175725884a2c303a6e8bf69724 |
|
BLAKE2b-256 | a8a10380b7da775d51a9356b60e154f5c2e8ac2a3264b8e5932cdb3ddd87dff4 |
Hashes for mutf8-1.0.2-cp35-cp35m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6f6a3bf5a08791acbddda598911d8085edbb3991dd1caff0fc57089ead35366c |
|
MD5 | 67d24538dcdec4317db52a3dafd041ec |
|
BLAKE2b-256 | aae07905f21310719f8294fbe1d91f3137674ea2e426ff75dae635811cc5aaad |