cChardet is high speed universal character encoding detector.
Project description
cChardet
Work In Progress Branch
cChardet is high speed universal character encoding detector. - binding to uchardet.
Supported Languages/Encodings
International (Unicode)
UTF-8
UTF-16BE / UTF-16LE
UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431
Arabic
ISO-8859-6
WINDOWS-1256
Bulgarian
ISO-8859-5
WINDOWS-1251
Chinese
ISO-2022-CN
BIG5
EUC-TW
GB18030
HZ-GB-2312
Croatian:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Czech
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Danish
ISO-8859-1
ISO-8859-15
WINDOWS-1252
English
ASCII
Esperanto
ISO-8859-3
Estonian
ISO-8859-4
ISO-8859-13
ISO-8859-13
Windows-1252
Windows-1257
Finnish
ISO-8859-1
ISO-8859-4
ISO-8859-9
ISO-8859-13
ISO-8859-15
WINDOWS-1252
French
ISO-8859-1
ISO-8859-15
WINDOWS-1252
German
ISO-8859-1
WINDOWS-1252
Greek
ISO-8859-7
WINDOWS-1253
Hebrew
ISO-8859-8
WINDOWS-1255
Hungarian:
ISO-8859-2
WINDOWS-1250
Irish Gaelic
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Italian
ISO-8859-1
ISO-8859-3
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Japanese
ISO-2022-JP
SHIFT_JIS
EUC-JP
Korean
ISO-2022-KR
EUC-KR / UHC
Lithuanian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Latvian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Maltese
ISO-8859-3
Polish:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Portuguese
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Romanian:
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
Russian
ISO-8859-5
KOI8-R
WINDOWS-1251
MAC-CYRILLIC
IBM866
IBM855
Slovak
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Slovene
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
M
Example
# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"src/tests/samples/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
msg = f.read()
result = chardet.detect(msg)
print(result)
Benchmark
$ cd src/
$ pip install chardet
$ python tests/bench.py
Results
CPU: Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz
RAM: DDR3 1600Mhz 16GB
Platform: Ubuntu 16.04 amd64
Python 2.7.12
Request (call/s) |
|
---|---|
chardet |
0.26 |
cchardet |
1341.81 |
Python 3.6.0
Request (call/s) |
|
---|---|
chardet |
0.26 |
cchardet |
1472.43 |
LICENSE
See COPYING file.
Contact
CHANGES
2.0a2 (2017-03-28)
Update uchardet repo (Fix memory leak)
2.0a1 (2017-03-28)
Replace uchardet-enhanced to uchardet
Remove Detector class
1.1.3 (2017-02-26)
Support AArch64
1.1.2 (2017-01-08)
Support Python 3.6
1.1.1 (2016-11-05)
Use len() function (9e61cb9e96b138b0d18e5f9e013e144202ae4067)
Remove detect function in _cchardet.pyx (25b581294fc0ae8f686ac9972c8549666766f695)
Support manylinux1 wheel
1.1.0 (2016-10-17)
Add Detector class
Improve unit tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for cchardet-2.0a2-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f78094dadb44edaf09663817cecd24a3845de8061fffd3f412acaa6989d12265 |
|
MD5 | d57d23d8485fcfc368b2ae83e906ac99 |
|
BLAKE2b-256 | 6b98213bae20083c59579ca451ca146de095a4c383f7c4447f437417a31a7e4c |
Hashes for cchardet-2.0a2-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c74b151f56bdacf97352c6fbe35e46975d1455badf0270cf133c3745d1b1f378 |
|
MD5 | df7a110967f2df48333e5d9cc5ac676b |
|
BLAKE2b-256 | c61631cddbc6fe04b8c3e828ebafd1a1f1e019a8c4aae0152053992c206706d0 |
Hashes for cchardet-2.0a2-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dbd2aa11bd5ae8eceb8913b00fd33c4698a108f3b15392c670cc1f28e4440b8c |
|
MD5 | 258f69964c6260cdb2a1ac49e378fd5e |
|
BLAKE2b-256 | 090dfa68f934fd8e701d7974396c4eefa5ceac0869efbcd1bb2a224b74f0be69 |
Hashes for cchardet-2.0a2-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1bb4380cfc61117f3c812a944cba034e2be1c1a1e9d812a14e3ec11b5e749c0b |
|
MD5 | 8af409e98a9ad25f3b1647cf569774dd |
|
BLAKE2b-256 | 298f88ea2b88b1c7135097be82aa2751e1b52a42d28df1c547c07fb12adbc5f7 |
Hashes for cchardet-2.0a2-cp35-cp35m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 587daf72f4a55b9eb0d61d29308209b0f4294cb474fb78523cd00301ed7ab040 |
|
MD5 | dda23c363a0427193b23e92a84f3fa5f |
|
BLAKE2b-256 | cebe9936d7c82cb2da210ef216c586a589e565adb3a418cc5adc78fc2dd8b997 |
Hashes for cchardet-2.0a2-cp35-cp35m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | afe7ec4fabea567a4a93b09fd9289911c374d1a087bbd239620f2513a99c5b48 |
|
MD5 | f98b7e8cd7b86d21e0c2176f6f4d0b86 |
|
BLAKE2b-256 | bfceae0f9211293d080cac226de2ca159a0c9f04dbf19cc332983888d9af77da |
Hashes for cchardet-2.0a2-cp35-cp35m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9034aef90d48b619dc38a9db978b3c038b55a2ac2fcb090c5593bd559d6e614d |
|
MD5 | 5085b3daaf4fcf25ae24529a3e9e7e9b |
|
BLAKE2b-256 | 9531b9420f5d50de4776583aa06fd720f1face5305d5764f5644cabd50e0c092 |
Hashes for cchardet-2.0a2-cp35-cp35m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 69b04f74e112fed8720f4710b33a762568a3ce1e0868b1c925a09af07b34c879 |
|
MD5 | f893b4de73aa6a00b7c9bdde6ec23319 |
|
BLAKE2b-256 | b0528f095ccba6bbe556d0d3e23c73c0ce242285ca4ad9cb8748aa223b4b0b0d |
Hashes for cchardet-2.0a2-cp34-cp34m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 536b93e7d02b49792cdf428c23b1f56dc076464c7fa2c7321c6fc17bbf8519f3 |
|
MD5 | a354be663955b065436d8c71dfa64d57 |
|
BLAKE2b-256 | eb35b2bd9a7869d6b36c064d01c77a543a1a27cf8ce16cb2ca6da01aa999da78 |
Hashes for cchardet-2.0a2-cp34-cp34m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d76ffd4a6a1bc3c61d70645e50976c2846e324e52879a6afe382d30880dfbd9 |
|
MD5 | db826bc2623e3bc5bca47a703775e161 |
|
BLAKE2b-256 | ac9f5debda15fdbbdd8374a068493ead4dc9de63b21b7860742372c5bbe00e1c |
Hashes for cchardet-2.0a2-cp34-cp34m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bca71ffd571efe8ff5f70575d2a0619463711f0165b279cddb10bc6216164070 |
|
MD5 | e9090d8b2081bf7d5878a822406ba647 |
|
BLAKE2b-256 | 3e67582ddc2b37161ebf92a5eb5bbc361127d8e2f6aecc6249825cf1ec8e6e11 |
Hashes for cchardet-2.0a2-cp34-cp34m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 670cc332320d44c09b1cf7631261f9c5ad7f3edd8bcc4176fe9b8e461c9e3b1e |
|
MD5 | 9bb2fa4d3e042741fa17cc3c78141527 |
|
BLAKE2b-256 | e4a98c9996f9c9c40877b9d6205f351afe06f4d5d322f2e9e748b420018b59a5 |
Hashes for cchardet-2.0a2-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | efe0f6fe040c72377e29adc4b570916cf010d7fb6449b3177f3ece9518f44825 |
|
MD5 | 2d955fd2922824648554cacf3be64303 |
|
BLAKE2b-256 | 4ad99595d3b328337f8e1331d708ca4b604402b7019090285b1d6b66dcccd8d3 |
Hashes for cchardet-2.0a2-cp27-cp27mu-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f1a9e6eb79432be2ab76294e44770f9a8c42829842f1642ec355de73acad29d9 |
|
MD5 | 5c1ac3e52d9a88ff5a928b2408b0f4a1 |
|
BLAKE2b-256 | 752b12cd26ece40633b76554bd720fb828bc826450819c3ea6360de5aae2c275 |
Hashes for cchardet-2.0a2-cp27-cp27m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84ffa9b675e483f5a5e0cecbf3a8fb49ea85c90483d475052c3fafe137a3d6b7 |
|
MD5 | b0c2f0cbeefa86f3462493b6931a0ab2 |
|
BLAKE2b-256 | bce340ae40129d709aec8397ab31e10828bb9f0925302707bc3ee2283e9f2f64 |
Hashes for cchardet-2.0a2-cp27-cp27m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d75ad18b0284cb666c088175ac4c2bbc6502e351e81a95cc94021b19de6f29e7 |
|
MD5 | 1d1af3f2a67cb6e4dd3f69961c5313a2 |
|
BLAKE2b-256 | eeb049a99e687a9b4a9484b0fb370538200cd7e27e6387e5ddeb2d6ea91f99d0 |
Hashes for cchardet-2.0a2-cp27-cp27m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a9979760f2c3574daa308469529232980d850ee229b1a7c8d5fb1362678f4fbb |
|
MD5 | 7d83afdca286ee748479ced0b79be26e |
|
BLAKE2b-256 | 76c7ba509fac2fc8f2dbcdcb8d8101af7b0bba3cc7ea42d6c5846a6ed1c5fa17 |
Hashes for cchardet-2.0a2-cp27-cp27m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b2c504e3aea4c69790d30f027c34f75217a3cb4be79150761c67b7c24bb34dd |
|
MD5 | 33246f307ac02003f800bff80c48d8ed |
|
BLAKE2b-256 | b92c5e024267ae4dfcf37d63445c9b4f5429e3eaad966587840209c3146a5128 |