cChardet is high speed universal character encoding detector.
Project description
cChardet
cChardet is high speed universal character encoding detector. - binding to uchardet.
Supported Languages/Encodings
International (Unicode)
UTF-8
UTF-16BE / UTF-16LE
UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431
Arabic
ISO-8859-6
WINDOWS-1256
Bulgarian
ISO-8859-5
WINDOWS-1251
Chinese
ISO-2022-CN
BIG5
EUC-TW
GB18030
HZ-GB-2312
Croatian:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Czech
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Danish
ISO-8859-1
ISO-8859-15
WINDOWS-1252
English
ASCII
Esperanto
ISO-8859-3
Estonian
ISO-8859-4
ISO-8859-13
ISO-8859-13
Windows-1252
Windows-1257
Finnish
ISO-8859-1
ISO-8859-4
ISO-8859-9
ISO-8859-13
ISO-8859-15
WINDOWS-1252
French
ISO-8859-1
ISO-8859-15
WINDOWS-1252
German
ISO-8859-1
WINDOWS-1252
Greek
ISO-8859-7
WINDOWS-1253
Hebrew
ISO-8859-8
WINDOWS-1255
Hungarian:
ISO-8859-2
WINDOWS-1250
Irish Gaelic
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Italian
ISO-8859-1
ISO-8859-3
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Japanese
ISO-2022-JP
SHIFT_JIS
EUC-JP
Korean
ISO-2022-KR
EUC-KR / UHC
Lithuanian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Latvian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Maltese
ISO-8859-3
Polish:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Portuguese
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Romanian:
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
Russian
ISO-8859-5
KOI8-R
WINDOWS-1251
MAC-CYRILLIC
IBM866
IBM855
Slovak
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Slovene
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
M
Example
# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"src/tests/samples/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
msg = f.read()
result = chardet.detect(msg)
print(result)
Benchmark
$ cd src/
$ pip install chardet
$ python tests/bench.py
Results
CPU: Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz
RAM: DDR3 1600Mhz 16GB
Platform: Ubuntu 16.04 amd64
Python 2.7.13
Request (call/s) |
|
---|---|
chardet v3.0.2 |
0.36 |
cchardet v2.0.1 |
1396.42 |
Python 3.6.1
Request (call/s) |
|
---|---|
chardet v3.0.2 |
0.35 |
cchardet v2.0.1 |
1467.77 |
LICENSE
See COPYING file.
Contact
CHANGES
2.1.1 (2017-07-01)
fix that different results with different chuck sizes
fix that assignments to nsSMState in nsCodingStateMachine result in unspecified behavior
include COPYING in package
2.1.0 (2017-05-15)
2.0.1 (2017-04-25)
2.0.0 (2017-04-06)
Improve tests
2.0a4 (2017-04-05)
Update uchardet repo (Fix buffer overflow)
2.0a3 (2017-03-29)
Implement UniversalDetector (like chardet)
2.0a2 (2017-03-28)
Update uchardet repo (Fix memory leak)
2.0a1 (2017-03-28)
Replace uchardet-enhanced to uchardet
Remove Detector class
1.1.3 (2017-02-26)
Support AArch64
1.1.2 (2017-01-08)
Support Python 3.6
1.1.1 (2016-11-05)
Use len() function (9e61cb9e96b138b0d18e5f9e013e144202ae4067)
Remove detect function in _cchardet.pyx (25b581294fc0ae8f686ac9972c8549666766f695)
Support manylinux1 wheel
1.1.0 (2016-10-17)
Add Detector class
Improve unit tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for cchardet-2.1.1-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d6c8eb90a9aa77f94e040a75d563f65849ab3b0c8f675b27928a91583648f8f8 |
|
MD5 | 0a67623b6a5f06193fb24c4516c70d46 |
|
BLAKE2b-256 | ad33216ad3ba6f7982be3f9895bc9059c6b4a769e9522b3656c8ad87e7a49fdf |
Hashes for cchardet-2.1.1-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb8ee148e9fc13101e0e19ac98552d24b82731fcfddc915eed216c13ebbebec0 |
|
MD5 | 5281e786e102048d9b58b3f2d6c045ce |
|
BLAKE2b-256 | 2488240c5f53980f74ca58116f08a24be731965083b580320489b8bd4e3e1d9f |
Hashes for cchardet-2.1.1-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07dace80abce108d42a82be5a598797c0c07575741d81e698819bd42d367cdde |
|
MD5 | a05634277d3a0b8fc73f34f23f9a1f5a |
|
BLAKE2b-256 | f90a330740ba16f34599173fe7567baf4d847f31772bafd99f74c08e608701f6 |
Hashes for cchardet-2.1.1-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6dfc76b71f66e002a99efa68efe4366143e8845b54cf5623eb05b5fa8fb030d6 |
|
MD5 | 888a8912d8ef9f68efd37b9c6ddc8171 |
|
BLAKE2b-256 | e75b68c6fe9bc81d16e0d4b742b36225a2973713316d6bfeccf407f8640a9e3c |
Hashes for cchardet-2.1.1-cp35-cp35m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1c3addf0c7408f76b98bd5f55f3abe844716d47dd6ab0d32eea8caa11a8fa41 |
|
MD5 | 1050bdde4efb3efb11d4e8ef594b6b3a |
|
BLAKE2b-256 | 4917915c4d7eff8c6d611f9b7fa72dd6809a435c6794e34884380e0fd98bcca1 |
Hashes for cchardet-2.1.1-cp35-cp35m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e32c4a420c6f7c6ea8d8a1fe36c60c70316a4ca1779dba2e00044b61d8ee2017 |
|
MD5 | 3b5e0fda7b75d5328b72221690917851 |
|
BLAKE2b-256 | 1f0778cdac5a666b991273aa57f3d2afe64c5c6ae36a5f11003e014efcd7f399 |
Hashes for cchardet-2.1.1-cp35-cp35m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d12b3f1913068975f9b9431f3cdc44488786523cc6d5467ffcb5bd43d3210157 |
|
MD5 | 0bea04f9dda8b7c724fc4d016f6bb040 |
|
BLAKE2b-256 | 54f65819fdc63c74fd2c28b08498768310215431f9276af51bb8a75ea934875a |
Hashes for cchardet-2.1.1-cp35-cp35m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 36d58862c158de32ace6497e7bafc7f85049b35a3abbd65118baffbe2a1ec1e5 |
|
MD5 | 4bd3e07f14c81b7be3e8ce60f7973bc9 |
|
BLAKE2b-256 | 583a6cc4aba0d3197b61287f566ca700c3eee34ea0e0e8cdfee5a24c84dcfce9 |
Hashes for cchardet-2.1.1-cp34-cp34m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6e001eb2ff93c4c31a9952cf01c71f5f95c758314032094df5cf086168678b23 |
|
MD5 | f14cad635cc8db3c54bc880c3ede894d |
|
BLAKE2b-256 | 1472a623ecf82a368a5b5202fb840fa915e8afd37e923ed33135367c5fc8e22f |
Hashes for cchardet-2.1.1-cp34-cp34m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f70d1c41f0694d1411b47868fdb7c3147fd1bf09c22e6565a765eedfb888989 |
|
MD5 | 0bccac6fbf8c5b3399a3a2e2a5f987bc |
|
BLAKE2b-256 | a7c87a23d95e7fbe783c2664b2eb94f0a04c9d3a925e71ac8e6d94bc9c42dc81 |
Hashes for cchardet-2.1.1-cp34-cp34m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 69311e20183056b45313475cc05c3e968faa2b14a466a6b0c23780645a462afe |
|
MD5 | 04ca204c697a01d37b1166d8c58420ce |
|
BLAKE2b-256 | a86e993de1f94421ae69bfbe5a4e011d94fc93e9d1fb766e1deff6f428084608 |
Hashes for cchardet-2.1.1-cp34-cp34m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | feda07443d732d86c9821671a898107b96ceb00462f405ec1dc08a353a9ddab0 |
|
MD5 | f29428bfbbde1f4e7a07210a9947e0a2 |
|
BLAKE2b-256 | ca4d06a3c2618164753deec2b749d1a812b6fb8748fb75bab8210359b5c3e90b |
Hashes for cchardet-2.1.1-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 823a981ba75fe8c12a0a0259eb80ec3a657273559f6d7445ba6fe2d2b061c8f9 |
|
MD5 | 1de1507f4c066cd94b65f18ec8cb245a |
|
BLAKE2b-256 | feb4e71bd76e37ad9fab2a0b89acd2fadc19d59b0a391db310b9c30b4d7c7983 |
Hashes for cchardet-2.1.1-cp27-cp27mu-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b94a65d3a8cc900058e6aaedc0dde9c99ffe436d8670d156784d7b561b874cf5 |
|
MD5 | f6e7d33dc1afc272e2cea9730188a7f5 |
|
BLAKE2b-256 | f2b89b6d5f165cd62efe34938e0ab4d7d17ef6bffe8fe0e15feb19f1cbe2723f |
Hashes for cchardet-2.1.1-cp27-cp27m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f4e3d0d9a0113cdfbc2fafa995674c1c49ed4166543b454945ca44d6e2148935 |
|
MD5 | ac01ce5b8ab8b44cffb412730c322cc7 |
|
BLAKE2b-256 | 642a0e3796f3af0924e157b29d0224533db65cd032ec96aa1dafb545433f1861 |
Hashes for cchardet-2.1.1-cp27-cp27m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7187a01130b838cea449904f3aa5c0bee0609fcc0f5f667f4ce08ea99d102ddc |
|
MD5 | 2da112509c7b0bb96b43727d7e1e6fc1 |
|
BLAKE2b-256 | d0719841419b316232e39ff9bd7d4b49295c0431e6789635cefb0a381c3d5f28 |
Hashes for cchardet-2.1.1-cp27-cp27m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a62b29c8c5a41f5ae95f620746d6db03b86fb259340fd991c9a608aabc60a275 |
|
MD5 | ad334adb1d988e687b682eadd7664b42 |
|
BLAKE2b-256 | 4021e682c89b09c8e08ad00d73ddfb716ee570859dcaf7fe907255dba289413f |
Hashes for cchardet-2.1.1-cp27-cp27m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e47d90a8484cc425ca4c13a204901e24e2d0b3e206deef7cf391c10639d33d6b |
|
MD5 | 1822c6f86310c4667bda01b03b0c69b5 |
|
BLAKE2b-256 | 4e6bcaea582ded84e0c86119adb405ad0ef05e08f6d98db58c7287362f97f850 |