cChardet is high speed universal character encoding detector.
Project description
cChardet
cChardet is high speed universal character encoding detector. - binding to uchardet.
Supported Languages/Encodings
International (Unicode)
UTF-8
UTF-16BE / UTF-16LE
UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431
Arabic
ISO-8859-6
WINDOWS-1256
Bulgarian
ISO-8859-5
WINDOWS-1251
Chinese
ISO-2022-CN
BIG5
EUC-TW
GB18030
HZ-GB-2312
Croatian:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Czech
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Danish
ISO-8859-1
ISO-8859-15
WINDOWS-1252
English
ASCII
Esperanto
ISO-8859-3
Estonian
ISO-8859-4
ISO-8859-13
ISO-8859-13
Windows-1252
Windows-1257
Finnish
ISO-8859-1
ISO-8859-4
ISO-8859-9
ISO-8859-13
ISO-8859-15
WINDOWS-1252
French
ISO-8859-1
ISO-8859-15
WINDOWS-1252
German
ISO-8859-1
WINDOWS-1252
Greek
ISO-8859-7
WINDOWS-1253
Hebrew
ISO-8859-8
WINDOWS-1255
Hungarian:
ISO-8859-2
WINDOWS-1250
Irish Gaelic
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Italian
ISO-8859-1
ISO-8859-3
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Japanese
ISO-2022-JP
SHIFT_JIS
EUC-JP
Korean
ISO-2022-KR
EUC-KR / UHC
Lithuanian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Latvian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Maltese
ISO-8859-3
Polish:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Portuguese
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Romanian:
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
Russian
ISO-8859-5
KOI8-R
WINDOWS-1251
MAC-CYRILLIC
IBM866
IBM855
Slovak
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Slovene
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
M
Example
# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"src/tests/samples/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
msg = f.read()
result = chardet.detect(msg)
print(result)
Benchmark
$ cd src/
$ pip install chardet
$ python tests/bench.py
Results
CPU: Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz
RAM: DDR3 1600Mhz 16GB
Platform: Ubuntu 16.04 amd64
Python 3.6.1
Request (call/s) |
|
---|---|
chardet v3.0.2 |
0.35 |
cchardet v2.0.1 |
1467.77 |
LICENSE
See COPYING file.
Contact
Platform
Support
Windows i686, x86_64
Linux i686, x86_64
macOS x86_64
Do not Support
CHANGES
2.1.7 (2020-10-27)
support Python 3.9
drop support for Python 3.5
2.1.6 (2020-03-17)
drop support for Python 2.7
support Github Actions
update dev-dependencies
2.1.5 (2019-09-27)
update language models (uchardet)
add iso8859-2 test but disabled it
support Python 3.8
drop support for Python 3.4
2.1.4 (2018-09-27)
disable LTO because become poor performance
2.1.3 (2018-09-26)
support Python 3.7
2.1.2 (2018-09-26)
enable LTO for wheel builds
update Cython
2.1.1 (2017-07-01)
fix that different results with different chuck sizes
fix that assignments to nsSMState in nsCodingStateMachine result in unspecified behavior
include COPYING in package
2.1.0 (2017-05-15)
2.0.1 (2017-04-25)
2.0.0 (2017-04-06)
Improve tests
2.0a4 (2017-04-05)
Update uchardet repo (Fix buffer overflow)
2.0a3 (2017-03-29)
Implement UniversalDetector (like chardet)
2.0a2 (2017-03-28)
Update uchardet repo (Fix memory leak)
2.0a1 (2017-03-28)
Replace uchardet-enhanced to uchardet
Remove Detector class
1.1.3 (2017-02-26)
Support AArch64
1.1.2 (2017-01-08)
Support Python 3.6
1.1.1 (2016-11-05)
Use len() function (9e61cb9e96b138b0d18e5f9e013e144202ae4067)
Remove detect function in _cchardet.pyx (25b581294fc0ae8f686ac9972c8549666766f695)
Support manylinux1 wheel
1.1.0 (2016-10-17)
Add Detector class
Improve unit tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for cchardet-2.1.7-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 24974b3e40fee9e7557bb352be625c39ec6f50bc2053f44a3d1191db70b51675 |
|
MD5 | fe14c82276843d48cc6189b9bf499733 |
|
BLAKE2b-256 | 3bbf14cea23ee6f5ccfc2238235c7a47f88145490e9c58708dc0ca505ad512c6 |
Hashes for cchardet-2.1.7-cp39-cp39-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2309ff8fc652b0fc3c0cff5dbb172530c7abb92fe9ba2417c9c0bcf688463c1c |
|
MD5 | 7b12756b7c13370044547e1d69e09d84 |
|
BLAKE2b-256 | 9d38dcc25b61c506274e1bb5086834da813638ea047d0fdbbf3eafc9e0ea41f2 |
Hashes for cchardet-2.1.7-cp39-cp39-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c96aee9ebd1147400e608a3eff97c44f49811f8904e5a43069d55603ac4d8c97 |
|
MD5 | 5b121ded28f97aff7a4dc71879cc9f7a |
|
BLAKE2b-256 | bed33f9c005bead891d320ea3e796e5ed76776d2ac0671530188984bb632559b |
Hashes for cchardet-2.1.7-cp39-cp39-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 80e6faae75ecb9be04a7b258dc4750d459529debb6b8dee024745b7b5a949a34 |
|
MD5 | 97287df5b8f68856385b082130631ecf |
|
BLAKE2b-256 | f1456ea939fb941b4699561d24462ac5cff7f6ed1f10162299de83a6d9e11287 |
Hashes for cchardet-2.1.7-cp39-cp39-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fdac1e4366d0579fff056d1280b8dc6348be964fda8ebb627c0269e097ab37fa |
|
MD5 | c1ab0148e3b07dd934514d1cedd755fa |
|
BLAKE2b-256 | 19093ab7094e7c4bc9fd9830c8d1c0c15013b7cd9ba13c04a75e8fea08036f8a |
Hashes for cchardet-2.1.7-cp39-cp39-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd7f262f41fd9caf5a5f09207a55861a67af6ad5c66612043ed0f81c58cdf376 |
|
MD5 | dc01007b0b1faf171571739ff7485f47 |
|
BLAKE2b-256 | 5ffdd31308d96a2e3a5aae521a9b0aa12d9e5a282e5b3a15ee083d43137a0e0a |
Hashes for cchardet-2.1.7-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 48ba829badef61441e08805cfa474ccd2774be2ff44b34898f5854168c596d4d |
|
MD5 | 5ef242897305f52e2c57da38f5d64676 |
|
BLAKE2b-256 | 04937fad4f4711b0d4eee4e917bbd3cd269fad682d42ccb1e43cc3512aa8af4b |
Hashes for cchardet-2.1.7-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 273699c4e5cd75377776501b72a7b291a988c6eec259c29505094553ee505597 |
|
MD5 | ab36d23c05e4956f181f62e5a6b1a9a1 |
|
BLAKE2b-256 | 21eb23024490b86c040248fa9eb92156d115288b8f8d194c0590d5550b96782f |
Hashes for cchardet-2.1.7-cp38-cp38-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b859069bbb9d27c78a2c9eb997e6f4b738db2d7039a03f8792b4058d61d1109 |
|
MD5 | 49369d60edbcdea606bc7fbedf0e3298 |
|
BLAKE2b-256 | 8891b17e4d000037d10f26a0b04a904ebd727f16993857e01f37bc49fef179ab |
Hashes for cchardet-2.1.7-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f16517f3697569822c6d09671217fdeab61dfebc7acb5068634d6b0728b86c0b |
|
MD5 | 2ebfc2f339a7953e7f4c4559c65376db |
|
BLAKE2b-256 | bb5fa822d40fec63f9e3caa52cbb61db7502dd904c878344035b52f1d3dc714a |
Hashes for cchardet-2.1.7-cp38-cp38-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 45456c59ec349b29628a3c6bfb86d818ec3a6fbb7eb72de4ff3bd4713681c0e3 |
|
MD5 | 7fc54d63fbfcf40c939b40e6e26f85d0 |
|
BLAKE2b-256 | a1bcbbb07486ef8da914f15f7bb1f3e1eaadd3b88a70e4decd64e184364173c5 |
Hashes for cchardet-2.1.7-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 90086e5645f8a1801350f4cc6cb5d5bf12d3fa943811bb08667744ec1ecc9ccd |
|
MD5 | e8c3a56427c4e2e42fbef8bd95b7d08e |
|
BLAKE2b-256 | c8e311ead63869139948f61b922a1539f4439554358e6b4d304ccf2f1c836004 |
Hashes for cchardet-2.1.7-cp38-cp38-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 27a9ba87c9f99e0618e1d3081189b1217a7d110e5c5597b0b7b7c3fedd1c340a |
|
MD5 | 5492936b5051a84ddab13c86d7955f52 |
|
BLAKE2b-256 | 789da9bd2cfc2d362a40c64c279c31dc5b92c65d5129c9265b589b90df5090d0 |
Hashes for cchardet-2.1.7-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b59ddc615883835e03c26f81d5fc3671fab2d32035c87f50862de0da7d7db535 |
|
MD5 | 7d9633c9980c6f5e4aa3055beb09211d |
|
BLAKE2b-256 | 308656083d1621aba5f5ff7c06b831d88bdacea8a2bbc2198d34c0b26f05ca62 |
Hashes for cchardet-2.1.7-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54d0b26fd0cd4099f08fb9c167600f3e83619abefeaa68ad823cc8ac1f7bcc0c |
|
MD5 | c08e2ecaa7bf45ec2f8e01d7cf0dbe21 |
|
BLAKE2b-256 | 1be6ecd8bb8440ad5c8b7cdb4d0c3fb1e7e653fc9b49c6feca4fce81bbc744a2 |
Hashes for cchardet-2.1.7-cp37-cp37m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 50ad671e8d6c886496db62c3bd68b8d55060688c655873aa4ce25ca6105409a1 |
|
MD5 | 205acfc203814b06a55241f29878ba49 |
|
BLAKE2b-256 | ea67b6ba47a3e34940557c2c6ad5337ecaa68781168401be1c818bcf74f8e3d7 |
Hashes for cchardet-2.1.7-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec3eb5a9c475208cf52423524dcaf713c394393e18902e861f983c38eeb77f18 |
|
MD5 | cffeced1256b56d33c5550367eae6645 |
|
BLAKE2b-256 | 8072a4fba7559978de00cf44081c548c5d294bf00ac7dcda2db405d2baa8c67a |
Hashes for cchardet-2.1.7-cp37-cp37m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b154effa12886e9c18555dfc41a110f601f08d69a71809c8d908be4b1ab7314f |
|
MD5 | 6a437f1825b41fb3863e95d436dd1317 |
|
BLAKE2b-256 | b4537a295565b7599db90a21a99a7c01c5f931a2d55ecd541209514f3222d37d |
Hashes for cchardet-2.1.7-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a39526c1c526843965cec589a6f6b7c2ab07e3e56dc09a7f77a2be6a6afa4636 |
|
MD5 | 8a3b8eafe9e70e56c2c034347f228ffa |
|
BLAKE2b-256 | d509da7c6f30cb053f77d79058994064d76b1b789c25caaa9cb20fce9f300370 |
Hashes for cchardet-2.1.7-cp37-cp37m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70eeae8aaf61192e9b247cf28969faef00578becd2602526ecd8ae7600d25e0e |
|
MD5 | 125acb26ca4982c3a8b3b1650ff357f9 |
|
BLAKE2b-256 | f1340c90d57d3fcbe7a05ee8810c003edba710ce0ac809d2cc0460b5f15de76d |
Hashes for cchardet-2.1.7-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 302aa443ae2526755d412c9631136bdcd1374acd08e34f527447f06f3c2ddb98 |
|
MD5 | ae319abafbc7af0be6a3a154a79ab29c |
|
BLAKE2b-256 | 0cf1c45f4ecb68d741596b570f8d585a7a36b9d39cb15fb5b7066751765325a9 |
Hashes for cchardet-2.1.7-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f86e0566cb61dc4397297696a4a1b30f6391b50bc52b4f073507a48466b6255a |
|
MD5 | 0426e6a8e80b2e264bc9fef77d0ad18f |
|
BLAKE2b-256 | 90df66ed9f7b330133bc412975760a67c2f1beaf19058fd58eba573b30f79790 |
Hashes for cchardet-2.1.7-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eee4f5403dc3a37a1ca9ab87db32b48dc7e190ef84601068f45397144427cc5e |
|
MD5 | b1183e668f62bc9a0b3ef1e2b0ae6410 |
|
BLAKE2b-256 | 0c4226ef0b6ed6c37ec08b89495b0c999ece88f72aba765603131b31712b6ed3 |
Hashes for cchardet-2.1.7-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54341e7e1ba9dc0add4c9d23b48d3a94e2733065c13920e85895f944596f6150 |
|
MD5 | aba5dd216be844686157690f34964db3 |
|
BLAKE2b-256 | a0e5a0b9edd8664ea3b0d3270c451ebbf86655ed9fc4c3e4c45b9afae9c2e382 |
Hashes for cchardet-2.1.7-cp36-cp36m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 228d2533987c450f39acf7548f474dd6814c446e9d6bd228e8f1d9a2d210f10b |
|
MD5 | fa59d462740e0e699bc98ceafaa6f603 |
|
BLAKE2b-256 | 7671ffe383995aba6ab7b67e72bf65837ab3f57990964f346dc923c07692859e |
Hashes for cchardet-2.1.7-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b6397d8a32b976a333bdae060febd39ad5479817fabf489e5596a588ad05133 |
|
MD5 | da329d6f0ef9779e1fe902b21553867a |
|
BLAKE2b-256 | b4c641a74560ab45f9cbc602dee51e5e3fad2f487805f7e0e5087999b69745d3 |
Hashes for cchardet-2.1.7-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5a25f9577e9bebe1a085eec2d6fdd72b7a9dd680811bba652ea6090fb2ff472f |
|
MD5 | b6c6ecb0cf15577f1b9cfe9707bff1cd |
|
BLAKE2b-256 | 03f21585c895df465fe183edbb85a6c98f62b4df70f05a47ce772ba25f89a9ce |
Hashes for cchardet-2.1.7-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6f70139aaf47ffb94d89db603af849b82efdf756f187cdd3e566e30976c519f |
|
MD5 | 0d79f829399f1d207bc1846d7016bfd8 |
|
BLAKE2b-256 | a8a65967f5c4095e6952781863422674d6c23eb40737480d3939be264bf1ae20 |