Skip to main content

No project description provided

Project description

chardetng-py

Simple python binding for rust charsetng library

Usage

The only function exposed by the library is decode. The only argument is a bytes object.

Returns a tuple of (decoded_text, detected encoding, had_errors).

>>> import chardetng_py
>>> chardetng_py.decode(b'Jakby r\xeaka Boga')
('Jakby rêka Boga', 'windows-1254', False)

Performance

In my basic testing, chardetng is about 40-80 times faster than chardet or charset_normalizer. (That's 4,000% - 8,000%). See bench.py for some example benchmarking code.

Accuracy

I chardetng and charset_normalizer agree about 70% of the time with an ML dataset I'm working on. The 30% of the time when they disagree, chardetng's encoding is almost always the correct one. Blog post by Henri Sivonen, Mozilla Foundation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chardetng_py-0.1.1.tar.gz (6.4 kB view hashes)

Uploaded Source

Built Distribution

chardetng_py-0.1.1-cp311-cp311-manylinux_2_34_x86_64.whl (359.1 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.34+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page