cChardet is high speed universal character encoding detector.
Project description
cChardet
cChardet is high speed universal character encoding detector. - binding to uchardet.
Supported Languages/Encodings
International (Unicode)
UTF-8
UTF-16BE / UTF-16LE
UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431
Arabic
ISO-8859-6
WINDOWS-1256
Bulgarian
ISO-8859-5
WINDOWS-1251
Chinese
ISO-2022-CN
BIG5
EUC-TW
GB18030
HZ-GB-2312
Croatian:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Czech
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Danish
ISO-8859-1
ISO-8859-15
WINDOWS-1252
English
ASCII
Esperanto
ISO-8859-3
Estonian
ISO-8859-4
ISO-8859-13
ISO-8859-13
Windows-1252
Windows-1257
Finnish
ISO-8859-1
ISO-8859-4
ISO-8859-9
ISO-8859-13
ISO-8859-15
WINDOWS-1252
French
ISO-8859-1
ISO-8859-15
WINDOWS-1252
German
ISO-8859-1
WINDOWS-1252
Greek
ISO-8859-7
WINDOWS-1253
Hebrew
ISO-8859-8
WINDOWS-1255
Hungarian:
ISO-8859-2
WINDOWS-1250
Irish Gaelic
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Italian
ISO-8859-1
ISO-8859-3
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Japanese
ISO-2022-JP
SHIFT_JIS
EUC-JP
Korean
ISO-2022-KR
EUC-KR / UHC
Lithuanian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Latvian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Maltese
ISO-8859-3
Polish:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Portuguese
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Romanian:
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
Russian
ISO-8859-5
KOI8-R
WINDOWS-1251
MAC-CYRILLIC
IBM866
IBM855
Slovak
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Slovene
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
M
Example
# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"src/tests/samples/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
msg = f.read()
result = chardet.detect(msg)
print(result)
Benchmark
$ cd src/
$ pip install chardet
$ python tests/bench.py
Results
CPU: Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz
RAM: DDR3 1600Mhz 16GB
Platform: Ubuntu 16.04 amd64
Python 2.7.13
Request (call/s) |
|
---|---|
chardet v3.0.2 |
0.36 |
cchardet v2.0.1 |
1396.42 |
Python 3.6.1
Request (call/s) |
|
---|---|
chardet v3.0.2 |
0.35 |
cchardet v2.0.1 |
1467.77 |
LICENSE
See COPYING file.
Contact
CHANGES
2.1.4 (2018-09-26)
disable LTO because become poor performance
2.1.3 (2018-09-26)
support Python 3.7
2.1.2 (2018-09-26)
enable LTO for wheel builds
update Cython
2.1.1 (2017-07-01)
fix that different results with different chuck sizes
fix that assignments to nsSMState in nsCodingStateMachine result in unspecified behavior
include COPYING in package
2.1.0 (2017-05-15)
2.0.1 (2017-04-25)
2.0.0 (2017-04-06)
Improve tests
2.0a4 (2017-04-05)
Update uchardet repo (Fix buffer overflow)
2.0a3 (2017-03-29)
Implement UniversalDetector (like chardet)
2.0a2 (2017-03-28)
Update uchardet repo (Fix memory leak)
2.0a1 (2017-03-28)
Replace uchardet-enhanced to uchardet
Remove Detector class
1.1.3 (2017-02-26)
Support AArch64
1.1.2 (2017-01-08)
Support Python 3.6
1.1.1 (2016-11-05)
Use len() function (9e61cb9e96b138b0d18e5f9e013e144202ae4067)
Remove detect function in _cchardet.pyx (25b581294fc0ae8f686ac9972c8549666766f695)
Support manylinux1 wheel
1.1.0 (2016-10-17)
Add Detector class
Improve unit tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for cchardet-2.1.4-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b09a488bbb35be95f82845e3c4312be9025e8377975b027eee67e0b39445e070 |
|
MD5 | 24d7ba642280165026485785d9c6b847 |
|
BLAKE2b-256 | 71f8561914ee99a16a215a7aaa50ae53bf3c04ec40bf074cdda77aebf9861a36 |
Hashes for cchardet-2.1.4-cp37-cp37m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b2893d558761b3534cddf5a49ba8d77df3d8f964d7b14680b925f4a85fc13476 |
|
MD5 | ba15a598b1e1bd77d47f519b49d4735b |
|
BLAKE2b-256 | dd6a6b54a269caa3518b6eaab3aec0fa4fa9eaa8019b4ae84a7c3ef66b1f9be0 |
Hashes for cchardet-2.1.4-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 079aa02a14072874d943a671ba778a9def5b0e3cedc2ac9f59308526cfb31472 |
|
MD5 | 55d72dac530fd7d2a76489894b208a21 |
|
BLAKE2b-256 | de3e9744f7aeed36226cd26101b26bdba890c2673f1956396b5c33e059785410 |
Hashes for cchardet-2.1.4-cp37-cp37m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f581ea172b252034f745dfd49733966b73b73907bdef0b47ad5f2008b797d54 |
|
MD5 | f6092551c74aa492a6e720b42caf5e82 |
|
BLAKE2b-256 | e4023563df3230f9ee1447ba0608adc7d2db809c998dc2daec27cbfa5d6bf50f |
Hashes for cchardet-2.1.4-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 41fced7a6f05ef859fe3eac89fc2120aca3cbbfd2b6c803bed3ee4bf02956903 |
|
MD5 | 001c4cc64f624f65f4bf30856c3fe158 |
|
BLAKE2b-256 | 76c946598ac93a1c35ba7d3fdc9547014dd1f119a59ff7ace6d89ab25c72d5b9 |
Hashes for cchardet-2.1.4-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5a8f9b229a30cd2432572d15e169483bc47c24418772ff58d0585050631c2fd |
|
MD5 | 66939f5b8a38a68819a2e09226159b83 |
|
BLAKE2b-256 | eeb062e00e60e7af7cf797232f6483f397c95eb63ba308fb9c211b7b8e872f43 |
Hashes for cchardet-2.1.4-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a35bd23cedbaa87cc9300af1dd10bb03fda41894045fbca7bfdf1d350b813f25 |
|
MD5 | ce8866273f0aa8d28746116034a50401 |
|
BLAKE2b-256 | 4fa5d073e7a0e992275d9b87e08cc3151c0ec27c718b78e6c943d264e5bd71d3 |
Hashes for cchardet-2.1.4-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8ba753ff73ca2f3554999a0e027eab9450f6ffdb7e92e1b4e13b52be89995349 |
|
MD5 | e62922abf97f37b792372819a3b4048f |
|
BLAKE2b-256 | f828484b8fe457e8293567b771e8fe6d44afce479161044eb88e5404e22adb94 |
Hashes for cchardet-2.1.4-cp35-cp35m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f1c3919fb71ac5da3aeee42c5b731c99dcd2beed71db7fdc28ca993c173f0402 |
|
MD5 | ea16a57db4c6221e3b42b3156ab4b19a |
|
BLAKE2b-256 | 802cceed9eafc6bc99cacf04ff82992dd6ab457327e4a7f24ca6850a70ded385 |
Hashes for cchardet-2.1.4-cp35-cp35m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ad8f61d6d1ca37bd4b954ad92d461ea4f58d0dc413b0790a5abed7c09e54996 |
|
MD5 | 22ac08ab2948eec379312791df9f1774 |
|
BLAKE2b-256 | 9fe73983d728189b8dd839915d7b48f27934d4e0e2a58d7053b0ec0f8f627280 |
Hashes for cchardet-2.1.4-cp35-cp35m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a8feb9a7def2310e18c27e485a21a38669abe8c2e36b93c6ce1a1363495d4cdf |
|
MD5 | 35fbf1ee6826de5eefec6ddb8d6a6212 |
|
BLAKE2b-256 | fd8d654ba71538619de141b4bf17e76fcbc503ef34a552f1c30f7426aa580655 |
Hashes for cchardet-2.1.4-cp35-cp35m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 80f7b087198827e60c81574c321b12f89188eae626ae1567d66808928be42f88 |
|
MD5 | 1382ca08074bbb75320df7f83bec8d3c |
|
BLAKE2b-256 | 012144c310b1d7c010415b23e26830bc9490eaa70a31da0349f7479d45d95480 |
Hashes for cchardet-2.1.4-cp34-cp34m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aa9dd4cee8a5210a6d0a7b263b98dc50637e00401fc4a5ad3ce2dbef54fdfa02 |
|
MD5 | 3909b3bd4af55effcfc404210383b071 |
|
BLAKE2b-256 | 09a50adc577aa6a4d170457bd8390289d57be913413790795c97ad8ee4f9930e |
Hashes for cchardet-2.1.4-cp34-cp34m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eeeb1b95bb5851dda93ee522860a0e6066d47921cb1d540cb778346e37e5a524 |
|
MD5 | 68e1248a179f89b1f03121bc8e2cfdfa |
|
BLAKE2b-256 | db8e44abed8e33ae7b7dde1abad5a0cac2328e83cad66b31c37c20f42542cb75 |
Hashes for cchardet-2.1.4-cp34-cp34m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ab9858a0673262e467619df91f425cfef0590dcf5deef5c0c7945e9dc4dbd7d8 |
|
MD5 | 4337e2cb570e5f0217ca3f598b7a5690 |
|
BLAKE2b-256 | 58e4429b97492bcbc704d703d5294f70da5316d2d2ba609a4fb226fc2024b520 |
Hashes for cchardet-2.1.4-cp34-cp34m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cbc206061e69561af6e4cba11f99abd928346c6b5bcdc83eb32ae40e9fc23a5f |
|
MD5 | 5fdfb5281ab96fad912e33217fb17774 |
|
BLAKE2b-256 | b8d23d2eb05188d86377beebebc669a1da818197e5708369347cac758e1c2adc |
Hashes for cchardet-2.1.4-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 440903d5dca3d326f4b841e7fa760b6af1be4f950ead1a6ff77b76eaa46f0cd3 |
|
MD5 | c2e76ff242b96947cd157711ab2efc6a |
|
BLAKE2b-256 | cdeeb1320c92200558a03656964c4d2eac4d99441e7e8ab8ce9b884dc2c4643f |
Hashes for cchardet-2.1.4-cp27-cp27mu-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e048a21688dcb4c797f40c8deb3600887bcaf435620256fd8becd4252012750 |
|
MD5 | 024176774a97fbb3c4b24728e19fcb32 |
|
BLAKE2b-256 | 30c4ecc1254865948e4e76477a8b51376ce190eee3696b4ffe9d95b86a9750b6 |
Hashes for cchardet-2.1.4-cp27-cp27m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 50170f346527c5df4d3cb94648ca187c666e61c0db6e510b984e867c44709d8b |
|
MD5 | 337a651fa71a06424538c2199e3be258 |
|
BLAKE2b-256 | c7105a12eeb53aeb4b6a79672dc14d087caee5f7fdff723743458b704547f5c3 |
Hashes for cchardet-2.1.4-cp27-cp27m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bded54eeccd5f810bc69e076b3d9a35819a92e5e0559ad274b9ae9061b1b881d |
|
MD5 | e00d81600a57624596ef3722ad5d6ec5 |
|
BLAKE2b-256 | 7a5c03eedadec1a1e34833b8ef5d23bd337ce96c25467ccec574c614d0b06c47 |
Hashes for cchardet-2.1.4-cp27-cp27m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | db30bf3825702c07fc55a290d41663fd8151f870642a15667bbabf81fff21e0b |
|
MD5 | cd80237892b681b67d18abcce493c5fd |
|
BLAKE2b-256 | 71d73b880043d37d858c11a69c4523a98bd1d04348b46c17965fa723af82c3eb |
Hashes for cchardet-2.1.4-cp27-cp27m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c55a6e7bc7337671c9f1ad90746c0efb2b2979ff4305c7ca1d7d381f05174c1 |
|
MD5 | 7e529eda8ab7a2a0ecbffbcdb80e580d |
|
BLAKE2b-256 | 973ff8ec991f54c4ea451031ac869d2039edc2d7045bb4751d7c38df59599d14 |