No project description provided
Project description
chardetng-py
Simple python binding for rust charsetng library
Usage
The only function exposed by the library is decode
. The only argument is a bytes
object.
Returns a tuple of (decoded_text, detected encoding, had_errors)
.
>>> import chardetng_py
>>> chardetng_py.decode(b'Jakby r\xeaka Boga')
('Jakby rêka Boga', 'windows-1254', False)
Performance
In my basic testing, chardetng is about 40-80 times faster than chardet
or charset_normalizer
. (That's 4,000% - 8,000%). See bench.py
for some example benchmarking code.
Accuracy
I chardetng and charset_normalizer agree about 70% of the time with an ML dataset I'm working on. The 30% of the time when they disagree, chardetng's encoding is almost always the correct one. Blog post by Henri Sivonen, Mozilla Foundation
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for chardetng_py-0.1.1-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 804d1ecac980c7d671dc7392a3ec4c9f176da82333456ca65f79137110cf4036 |
|
MD5 | 749593f46f13a478bc518e0853224b4b |
|
BLAKE2b-256 | d2da0ecaa998ed454e47b9ebbaff984fd82820b2d2bfe77d2b391384d81465de |