cChardet is high speed universal character encoding detector.
Project description
cChardet
Work In Progress Branch
cChardet is high speed universal character encoding detector. - binding to uchardet.
Supported Languages/Encodings
International (Unicode)
UTF-8
UTF-16BE / UTF-16LE
UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431
Arabic
ISO-8859-6
WINDOWS-1256
Bulgarian
ISO-8859-5
WINDOWS-1251
Chinese
ISO-2022-CN
BIG5
EUC-TW
GB18030
HZ-GB-2312
Croatian:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Czech
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Danish
ISO-8859-1
ISO-8859-15
WINDOWS-1252
English
ASCII
Esperanto
ISO-8859-3
Estonian
ISO-8859-4
ISO-8859-13
ISO-8859-13
Windows-1252
Windows-1257
Finnish
ISO-8859-1
ISO-8859-4
ISO-8859-9
ISO-8859-13
ISO-8859-15
WINDOWS-1252
French
ISO-8859-1
ISO-8859-15
WINDOWS-1252
German
ISO-8859-1
WINDOWS-1252
Greek
ISO-8859-7
WINDOWS-1253
Hebrew
ISO-8859-8
WINDOWS-1255
Hungarian:
ISO-8859-2
WINDOWS-1250
Irish Gaelic
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Italian
ISO-8859-1
ISO-8859-3
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Japanese
ISO-2022-JP
SHIFT_JIS
EUC-JP
Korean
ISO-2022-KR
EUC-KR / UHC
Lithuanian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Latvian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Maltese
ISO-8859-3
Polish:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Portuguese
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Romanian:
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
Russian
ISO-8859-5
KOI8-R
WINDOWS-1251
MAC-CYRILLIC
IBM866
IBM855
Slovak
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Slovene
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
M
Example
# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"src/tests/samples/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
msg = f.read()
result = chardet.detect(msg)
print(result)
Benchmark
$ cd src/
$ pip install chardet
$ python tests/bench.py
Results
CPU: Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz
RAM: DDR3 1600Mhz 16GB
Platform: Ubuntu 16.04 amd64
Python 2.7.12
Request (call/s) |
|
---|---|
chardet |
0.26 |
cchardet |
1341.81 |
Python 3.6.0
Request (call/s) |
|
---|---|
chardet |
0.26 |
cchardet |
1472.43 |
LICENSE
See COPYING file.
Contact
CHANGES
2.0.0 (2017-04-06)
Improve tests
2.0a4 (2017-04-05)
Update uchardet repo (Fix buffer overflow)
2.0a3 (2017-03-29)
Implement UniversalDetector (like chardet)
2.0a2 (2017-03-28)
Update uchardet repo (Fix memory leak)
2.0a1 (2017-03-28)
Replace uchardet-enhanced to uchardet
Remove Detector class
1.1.3 (2017-02-26)
Support AArch64
1.1.2 (2017-01-08)
Support Python 3.6
1.1.1 (2016-11-05)
Use len() function (9e61cb9e96b138b0d18e5f9e013e144202ae4067)
Remove detect function in _cchardet.pyx (25b581294fc0ae8f686ac9972c8549666766f695)
Support manylinux1 wheel
1.1.0 (2016-10-17)
Add Detector class
Improve unit tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for cchardet-2.0.0-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e7b6d8956dccffae81cd73cfad659703b68e1b762a642488b37b8fcd1b760a4 |
|
MD5 | d6e61e8d45c319dc191c2b153f0ebd2c |
|
BLAKE2b-256 | 44e924ecd522260d052c4de5040dd04675deafa8f44845c70ca5c03668b54e03 |
Hashes for cchardet-2.0.0-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | df1c5ce7c2d390423c763029056952cc9ebfae834886fcd78e2a9d01570ea5be |
|
MD5 | 012c4d2dceaa5e2e639c5e3f6320413d |
|
BLAKE2b-256 | a088bddd7e4f472c3d03f417366a7b8d96d9b249d47b5e569e8fa1a85014ea5d |
Hashes for cchardet-2.0.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a8d86713a7ff8edfff5c57541464e5716adeeba85ea2a176e7fbbb5d208ad31 |
|
MD5 | eec33cab43b98583ac0123ff87abeab7 |
|
BLAKE2b-256 | 7aafb4fa3eba61963adb163b843a358cc73c31fc0535dcfd3489d2fab6a0b20f |
Hashes for cchardet-2.0.0-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8967f47d402c55c7621dda0eec5e71d7469f2f044de560252926eb3b57ec23f5 |
|
MD5 | 90fcedfa8b6c431d9560102354a53dff |
|
BLAKE2b-256 | bbc7c3bb3b368f6012f375add4f581a741a8add188b802e936f42d6f028607dc |
Hashes for cchardet-2.0.0-cp35-cp35m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 34665343d6304fd81afc1f5a9525880d19ae71fe0bcd2a461583510041443b0e |
|
MD5 | f2964ad454f9dae0e807a594f37d80c7 |
|
BLAKE2b-256 | 7217040c0a32704b12e2f255a7aa4e8a9a46c099e9d59a27bc91b5c503f9185f |
Hashes for cchardet-2.0.0-cp35-cp35m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f8c0e5c84b06d6b678cf2170c9a74e73bca5d6282c05617cef61928276d0aa97 |
|
MD5 | 8e7836be1d398d9fe603f4c4261ccfb2 |
|
BLAKE2b-256 | 28a20197bd1cdb238a80a010c38296df9d32568cb7066d5aefbe353b82f9692b |
Hashes for cchardet-2.0.0-cp35-cp35m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b3da8f4aa2a587b23713710abebb425583a0b1617af30271201b08f4136a332 |
|
MD5 | cd369b6015e620f38754645f913873e7 |
|
BLAKE2b-256 | a1971310a825fc3bd8f1779d267f5b06205f78631036eec1fd57fcebb27d1fd3 |
Hashes for cchardet-2.0.0-cp35-cp35m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b11c8743b8b77850da94107c4c073498feb34527fa762fbba5b567c22a301917 |
|
MD5 | 92b4cfaea7261b02f59bbb9502a91f2e |
|
BLAKE2b-256 | 7a8269ec657dab81384f9fd493f9e483fa40c11ae96bb8ae4e20f7b113f7802e |
Hashes for cchardet-2.0.0-cp34-cp34m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a9780943fe202ebb23fa604808f1b3c76c391d962fdb385bd8e699ee6284eeee |
|
MD5 | 0cf11344f94383e2e56d1ddb439d0c34 |
|
BLAKE2b-256 | 349014de4a7db064d9cc11d076587e2346f5d5e140289e901a10d10f34d267af |
Hashes for cchardet-2.0.0-cp34-cp34m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f06fb13d5d9dbc2bfdad4449c216aa3a029a1762477d314d66de5736e6e853d6 |
|
MD5 | 938fe3e36120ff17467b3e17f651392b |
|
BLAKE2b-256 | 28c2126bc4a9b3369b571b2aa4a068408d7dc379c100c1996824249120941bf3 |
Hashes for cchardet-2.0.0-cp34-cp34m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7c5bf71aae5df7360222729ed5b5f5106bcd21454042924b61a76b2d0da1b53a |
|
MD5 | 5298836b248f783333b7abf46537ad56 |
|
BLAKE2b-256 | e1c8b8ce3d7d43de204372f0840d67b8f962167229d04a19f28eb6304663732d |
Hashes for cchardet-2.0.0-cp34-cp34m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2e315dd2221c60246acd12c1a9cc74d38bc42efa842c1fce75386b1f71f5e3a0 |
|
MD5 | 7ffd739d45a5ba1f71bd51f8b328adda |
|
BLAKE2b-256 | c0d55fe7575473cb0d395f66570ec6efb98c24d67b6714c774666afe2f6338f6 |
Hashes for cchardet-2.0.0-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c5f67dc633258843e252f47d4979ad91e092d80097868bdf64d9c1cc3a3cf120 |
|
MD5 | 630a1d434c200bda4154dff95f49ee52 |
|
BLAKE2b-256 | 4ea38b48365927d17ea09f0d201b5bf95325c5dbb9fb026266d57d704c2fd556 |
Hashes for cchardet-2.0.0-cp27-cp27mu-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 776df64e37bd2868d2d8f85ddc8bac039bda25f14cf6f144c532e39f8d907f86 |
|
MD5 | ed2fe00ee4ae2ac6039b04a7ee2f56c9 |
|
BLAKE2b-256 | baa824d714d58c38ebaed34bab9efbe5fcebfde09ad9d425c5872ddfadd025ae |
Hashes for cchardet-2.0.0-cp27-cp27m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5bee513cbe4dbbee42de5fd69950c26133f3a92889d7787b34c8ac01dc9051da |
|
MD5 | bbd49a9529c622487bb4d1ce9aee40d5 |
|
BLAKE2b-256 | c67119706c5a09668be7af8432e5121462b4507372c0f6273bd66eed105093a5 |
Hashes for cchardet-2.0.0-cp27-cp27m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0ea55cbf777c350e0e3cd3f13676f32b18f1865c5458174cf45010d4e81e2b87 |
|
MD5 | 18062a9fe1c9f11c510622e4b129d00c |
|
BLAKE2b-256 | 8cb7b374bed99a712a780bc44d8abd9205a1d593045033f24b0c43ff7583dbc4 |
Hashes for cchardet-2.0.0-cp27-cp27m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e174b0114185a6d2b487b5fd2a1adb18075c39c4a1dc528d02c5f2a3952321ab |
|
MD5 | ff979d13ed230b1593d29486fd2d012b |
|
BLAKE2b-256 | ff2709088a0fbc4bb7527f16f381070e41357bfa21eb6c6c97df66d65728c631 |
Hashes for cchardet-2.0.0-cp27-cp27m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 10f8c0877c7a465401a8dc3a3c93eb73fc455b9e075242dc9aff474a087177a9 |
|
MD5 | cd72c3a630bcf8018172e61cabba918a |
|
BLAKE2b-256 | 0fbe929162ca9fee3399eda89e59629136c16e1a88a058202a1b487dff8c7d05 |