cChardet is high speed universal character encoding detector.
Project description
cChardet
cChardet is high speed universal character encoding detector. - binding to uchardet.
Supported Languages/Encodings
International (Unicode)
UTF-8
UTF-16BE / UTF-16LE
UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431
Arabic
ISO-8859-6
WINDOWS-1256
Bulgarian
ISO-8859-5
WINDOWS-1251
Chinese
ISO-2022-CN
BIG5
EUC-TW
GB18030
HZ-GB-2312
Croatian:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Czech
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Danish
ISO-8859-1
ISO-8859-15
WINDOWS-1252
English
ASCII
Esperanto
ISO-8859-3
Estonian
ISO-8859-4
ISO-8859-13
ISO-8859-13
Windows-1252
Windows-1257
Finnish
ISO-8859-1
ISO-8859-4
ISO-8859-9
ISO-8859-13
ISO-8859-15
WINDOWS-1252
French
ISO-8859-1
ISO-8859-15
WINDOWS-1252
German
ISO-8859-1
WINDOWS-1252
Greek
ISO-8859-7
WINDOWS-1253
Hebrew
ISO-8859-8
WINDOWS-1255
Hungarian:
ISO-8859-2
WINDOWS-1250
Irish Gaelic
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Italian
ISO-8859-1
ISO-8859-3
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Japanese
ISO-2022-JP
SHIFT_JIS
EUC-JP
Korean
ISO-2022-KR
EUC-KR / UHC
Lithuanian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Latvian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Maltese
ISO-8859-3
Polish:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Portuguese
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Romanian:
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
Russian
ISO-8859-5
KOI8-R
WINDOWS-1251
MAC-CYRILLIC
IBM866
IBM855
Slovak
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Slovene
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
M
Example
# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"src/tests/samples/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
msg = f.read()
result = chardet.detect(msg)
print(result)
Benchmark
$ cd src/
$ pip install chardet
$ python tests/bench.py
Results
CPU: Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz
RAM: DDR3 1600Mhz 16GB
Platform: Ubuntu 16.04 amd64
Python 3.6.1
Request (call/s) |
|
---|---|
chardet v3.0.2 |
0.35 |
cchardet v2.0.1 |
1467.77 |
LICENSE
See COPYING file.
Contact
Platform
Support
Windows i686, x86_64
Linux i686, x86_64
macOS x86_64
Do not Support
CHANGES
2.1.6 (2020-03-17)
drop support for Python 2.7
support Github Actions
update dev-dependencies
2.1.5 (2019-09-27)
update language models (uchardet)
add iso8859-2 test but disabled it
support Python 3.8
drop support for Python 3.4
2.1.4 (2018-09-27)
disable LTO because become poor performance
2.1.3 (2018-09-26)
support Python 3.7
2.1.2 (2018-09-26)
enable LTO for wheel builds
update Cython
2.1.1 (2017-07-01)
fix that different results with different chuck sizes
fix that assignments to nsSMState in nsCodingStateMachine result in unspecified behavior
include COPYING in package
2.1.0 (2017-05-15)
2.0.1 (2017-04-25)
2.0.0 (2017-04-06)
Improve tests
2.0a4 (2017-04-05)
Update uchardet repo (Fix buffer overflow)
2.0a3 (2017-03-29)
Implement UniversalDetector (like chardet)
2.0a2 (2017-03-28)
Update uchardet repo (Fix memory leak)
2.0a1 (2017-03-28)
Replace uchardet-enhanced to uchardet
Remove Detector class
1.1.3 (2017-02-26)
Support AArch64
1.1.2 (2017-01-08)
Support Python 3.6
1.1.1 (2016-11-05)
Use len() function (9e61cb9e96b138b0d18e5f9e013e144202ae4067)
Remove detect function in _cchardet.pyx (25b581294fc0ae8f686ac9972c8549666766f695)
Support manylinux1 wheel
1.1.0 (2016-10-17)
Add Detector class
Improve unit tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for cchardet-2.1.6-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f245f045054e8d6dab2a0e366d3c74f3a47fb7dec2595ae2035b234b1a829c7a |
|
MD5 | ed313cecc0b95dfa06acd534160b654e |
|
BLAKE2b-256 | 8d40aa084aaa0ad155e4b1ba853d8afda249413ad1003ba90266e271c0a2b52c |
Hashes for cchardet-2.1.6-cp38-cp38-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54d2653520237ebbd2928f2c0f2eb7c616ee2b5194d73d945060cd54a7846b64 |
|
MD5 | d8dcec90c47ef2151831275076ebdefe |
|
BLAKE2b-256 | 1a89f4ce731b50c86fabcfe68d30600faf362957957e3dfb3f1702040f4a5d39 |
Hashes for cchardet-2.1.6-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | af284494ea6c40f9613b4d939abe585eb9290cb92037eab66122c93190fcb338 |
|
MD5 | a8d019dbbf5fb2810a40b699b25117b2 |
|
BLAKE2b-256 | 88f30db5b64fecac9d77302604eb8404807755e8882d3d31bbf33d037861e642 |
Hashes for cchardet-2.1.6-cp38-cp38-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5e38cfad9d3ca0f571c4352e9ca0f5ab718508f492a37d3236ae70810140e250 |
|
MD5 | 60fcc977e29c0adba5542f50ed6941ba |
|
BLAKE2b-256 | 62176bde0136d8f06a3f18721af620b630a4e62505f82e296740b290d47af031 |
Hashes for cchardet-2.1.6-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f7ade2578b2326a0a554c03f60c8d079331220179a592e83e143c9556b7f5b2 |
|
MD5 | 9eab249862b31a484c3dd115e565345c |
|
BLAKE2b-256 | 2f65f7e2433978c7d44c649008789e8e045d2ac9ac7f22816722553c2e4de131 |
Hashes for cchardet-2.1.6-cp38-cp38-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 40c199f9c0569ac479fae7c4e12d2e16fc1e8237836b928474fdd228b8d11477 |
|
MD5 | 2c7355b93597f595780f2222e3fa8633 |
|
BLAKE2b-256 | 67c0681500e76346e67067ba533bb2a904c9a773e23f6f159ea80841a61814b6 |
Hashes for cchardet-2.1.6-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 68409e00d75ff13dd7a192ec49559f5527ee8959a51a9f4dd7b168df972b4d44 |
|
MD5 | 00fac53c45a030e2ceaa4e196058c907 |
|
BLAKE2b-256 | 6d90f8f2e68777e0e3f7099300aed5d5ed7e33b661450b49c89c008693f202b2 |
Hashes for cchardet-2.1.6-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e27771798c8ad50df1375e762d59369354af94eb8ac21eca5bfd1eeef589f545 |
|
MD5 | 03b7266161282572478f731f0934afc5 |
|
BLAKE2b-256 | dddb34efa68df3e3f041b099039f778d5fb280f33767ae1891860d15b3ddedfd |
Hashes for cchardet-2.1.6-cp37-cp37m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b1d02c99f6444c63336a76638741eaf4ac4005b454e3b8252a40074bf0d84a1 |
|
MD5 | 37da99b356f96bc2211a0ec4f24bcef4 |
|
BLAKE2b-256 | 4f1121ab46b2a7ef9131b2d01c23d0f29a3cbff2bf138de7308fdb0b6b04ea47 |
Hashes for cchardet-2.1.6-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79b0e113144c2ef0050bc9fe647c7657c5298f3012ecd8937d930b24ddd61404 |
|
MD5 | 1ff94196099ebc4f4fe6903e898dd030 |
|
BLAKE2b-256 | 2ffaf0921c515df6d63d7e6fd9b5128c514d993317dbb63db6bbc6123c0d2c2a |
Hashes for cchardet-2.1.6-cp37-cp37m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | acc96b4a8f756af289fa90ffa67ddef57401d99131e51e71872e3609483941ce |
|
MD5 | 9e561cd7c5768a200fa910d3811cda79 |
|
BLAKE2b-256 | d9eb8abd6bb51464ad076318e3883bab90c8a6bb007cc806d65fb0f1c651e70c |
Hashes for cchardet-2.1.6-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f6e4e464e332da776b9c1a34e4e83b6301d38c2724efc93848c46ade66d02bb |
|
MD5 | 1acb42603148a56df45179bfeb2cf3e1 |
|
BLAKE2b-256 | d730c18386a3061561a6549647f8cf6b2d88508318aff440dd57b414500535ae |
Hashes for cchardet-2.1.6-cp37-cp37m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7bba1cbb4358dc9a2d2da00f4b38b159a5483d2f3b1d698a7c2cae518f955170 |
|
MD5 | 3dbbd18d531d601d87af58eb883e84de |
|
BLAKE2b-256 | d75ad97df6875e47145cd2d7703bfc0fe85cd2df02b00a26c66d9edc250cd270 |
Hashes for cchardet-2.1.6-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4096759825a130cb27a58ddf6d58e10abdd0127d29fbf53fde26df7ad879737b |
|
MD5 | cdeeb7888204094862f2eb0b38339f45 |
|
BLAKE2b-256 | 71026bd3384e783d624506c31665d7c94aa5720dba58c8c4c393a54217546165 |
Hashes for cchardet-2.1.6-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2a958fb093f69ee5f16be7a1aee5122e07aff4350fa4dc9b953b87c34468e605 |
|
MD5 | 16aa6f1ee5790b67cb139d679d1216d7 |
|
BLAKE2b-256 | 48925f3a19ba7b46ca931c49cb13e88aae007f09fed16cc36ba764548ca2d75c |
Hashes for cchardet-2.1.6-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 217a7008bd399bdb61f6a0a2570acc5c3a9f96140e0a0d089b9e748c4d4e4c4e |
|
MD5 | 40ed0bf37ab1c69998b66f80c8817e0a |
|
BLAKE2b-256 | 40b521d755aad0de246b3c8a9fc291c74d958cac513626190a620725d1d9dca0 |
Hashes for cchardet-2.1.6-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f5c94994d876d8709847c3a92643309d716f43716580a2e5831262366a9ee8b6 |
|
MD5 | 6bb1b818c40cabb2fc2fda9467f8d0cf |
|
BLAKE2b-256 | 1ec57e1a0d7b4afd83d6f8de794fce82820ec4c5136c6d52e14000822681a842 |
Hashes for cchardet-2.1.6-cp36-cp36m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf134e1cfb0c53f08abb1ab9158a7e7f859c3ddb451d5fe535a2cc5f2958a688 |
|
MD5 | 1ca021bf0edb6466a2af5d5bb0b6204b |
|
BLAKE2b-256 | 2be305cec33acc0ea660e78190b2be9d6a27135bbbf76ec0aace0506b7408180 |
Hashes for cchardet-2.1.6-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 27b0f23088873d1dd36d2c8a2e45c9167e312e1aac7e4baeb47f7428a2669638 |
|
MD5 | 68bf0336061e40f6f043c8b3a80457aa |
|
BLAKE2b-256 | d3b9c82e376881f3894869541fea2d5b79ec8e89517aebad0750d0e800ac5b4b |
Hashes for cchardet-2.1.6-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7a2d98df461d3f36b403fdd8d7890c823ed05bd98eb074412ed56fbfedb94751 |
|
MD5 | eab57b9aafeea0652833def716c055d3 |
|
BLAKE2b-256 | a8c62ab1e8bdd96756f577fa4d05fcb80e17c0e6e5692d6273c895247147807d |
Hashes for cchardet-2.1.6-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4486f6e5bdf06f0081d13832f2a061d9e90597eb02093fda9d37e3985e3b2ef2 |
|
MD5 | 177a40f8abbd12a73a285e11d66a871a |
|
BLAKE2b-256 | b82c1db8cad6dad7ffeaa3d7621c4c1e2a8636d05572302bab502f65da49e879 |
Hashes for cchardet-2.1.6-cp35-cp35m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84d2ce838cf3c2fe7f0517941702d42f7e598e5173632ec47a113cd521669b98 |
|
MD5 | 0983097d9ae176706375fbe63774df93 |
|
BLAKE2b-256 | 2d519e46bc1c6862d7e2a48ff693dff8fe25397bd2047ae76441a28e8b1d73d7 |
Hashes for cchardet-2.1.6-cp35-cp35m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dff9480d9b6260f59ad10e1cec5be13905be5da88a4a2bd5a5bd4d49c49c4a05 |
|
MD5 | 77192534a2a289f580e6ac690fd74c04 |
|
BLAKE2b-256 | 2a084ba0cadec157a489dc8ab3e3edd1c3b21d0932afb48410043cdc373527c6 |
Hashes for cchardet-2.1.6-cp35-cp35m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c05b66b12f9ab0493c5ffb666036fd8c9004a9cc9d5a9264dc24738b50ab8c3 |
|
MD5 | 6adac2ecfc79731170789278b42d6ee6 |
|
BLAKE2b-256 | 35d58e8839c702e607399ec5b995b4d1706efab611799bbd9c99238fd056fd1a |
Hashes for cchardet-2.1.6-cp35-cp35m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ccb9f6f06265382028468b47e726f2d42539256fb498d1b0e473c39037b42b8a |
|
MD5 | 38e023088588fef2ed81c748a38c093a |
|
BLAKE2b-256 | 3036ee7a68b06d5fe9f9ca5e143b60b1e0258ddeeb8f5ca40b7e0d503b61e62f |
Hashes for cchardet-2.1.6-cp35-cp35m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 953fe382304b19f5aa8fc2da4b092a3bb58a477d33af4def4b81abdce4c9288c |
|
MD5 | 49e235755cf813710cc16a41a39d8e7f |
|
BLAKE2b-256 | 773f6736cad2dee2da08a91688a82023f4bfd24d065f075f9b5e6d4e6932445d |
Hashes for cchardet-2.1.6-cp35-cp35m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd16f57ce42a72397cd9fe38977fc809eb02172731cb354572f28a6d8e4cf322 |
|
MD5 | e99e4fc59934fa24d59a3e71a234b32d |
|
BLAKE2b-256 | c248d8280369be5dcba1ad6703380c807f3cc329cf1ddda6c121362dd69b0e3d |
Hashes for cchardet-2.1.6-cp35-cp35m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2aa1b008965c703ad6597361b0f6d427c8971fe94a2c99ec3724c228ae50d6a6 |
|
MD5 | 534e6e594c16d8c10754431e53d09a12 |
|
BLAKE2b-256 | c7106f4e5fdfb5cb01eb59521478af4d1e7bbe6a2c6a3c00222a6ba0b27c57f1 |