cChardet is high speed universal character encoding detector.
Project description
cChardet
Work In Progress Branch
cChardet is high speed universal character encoding detector. - binding to uchardet.
Supported Languages/Encodings
International (Unicode)
UTF-8
UTF-16BE / UTF-16LE
UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431
Arabic
ISO-8859-6
WINDOWS-1256
Bulgarian
ISO-8859-5
WINDOWS-1251
Chinese
ISO-2022-CN
BIG5
EUC-TW
GB18030
HZ-GB-2312
Croatian:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Czech
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Danish
ISO-8859-1
ISO-8859-15
WINDOWS-1252
English
ASCII
Esperanto
ISO-8859-3
Estonian
ISO-8859-4
ISO-8859-13
ISO-8859-13
Windows-1252
Windows-1257
Finnish
ISO-8859-1
ISO-8859-4
ISO-8859-9
ISO-8859-13
ISO-8859-15
WINDOWS-1252
French
ISO-8859-1
ISO-8859-15
WINDOWS-1252
German
ISO-8859-1
WINDOWS-1252
Greek
ISO-8859-7
WINDOWS-1253
Hebrew
ISO-8859-8
WINDOWS-1255
Hungarian:
ISO-8859-2
WINDOWS-1250
Irish Gaelic
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Italian
ISO-8859-1
ISO-8859-3
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Japanese
ISO-2022-JP
SHIFT_JIS
EUC-JP
Korean
ISO-2022-KR
EUC-KR / UHC
Lithuanian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Latvian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Maltese
ISO-8859-3
Polish:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Portuguese
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Romanian:
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
Russian
ISO-8859-5
KOI8-R
WINDOWS-1251
MAC-CYRILLIC
IBM866
IBM855
Slovak
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Slovene
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
M
Example
# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"src/tests/samples/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
msg = f.read()
result = chardet.detect(msg)
print(result)
Benchmark
$ cd src/
$ pip install chardet
$ python tests/bench.py
Results
CPU: Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz
RAM: DDR3 1600Mhz 16GB
Platform: Ubuntu 16.04 amd64
Python 2.7.12
Request (call/s) |
|
---|---|
chardet |
0.26 |
cchardet |
1341.81 |
Python 3.6.0
Request (call/s) |
|
---|---|
chardet |
0.26 |
cchardet |
1472.43 |
LICENSE
See COPYING file.
Contact
CHANGES
2.0a4 (2017-04-05)
Update uchardet repo (Fix buffer overflow)
2.0a3 (2017-03-29)
Implement UniversalDetector (like chardet)
2.0a2 (2017-03-28)
Update uchardet repo (Fix memory leak)
2.0a1 (2017-03-28)
Replace uchardet-enhanced to uchardet
Remove Detector class
1.1.3 (2017-02-26)
Support AArch64
1.1.2 (2017-01-08)
Support Python 3.6
1.1.1 (2016-11-05)
Use len() function (9e61cb9e96b138b0d18e5f9e013e144202ae4067)
Remove detect function in _cchardet.pyx (25b581294fc0ae8f686ac9972c8549666766f695)
Support manylinux1 wheel
1.1.0 (2016-10-17)
Add Detector class
Improve unit tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for cchardet-2.0a4-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 83157e64b69d4901f690a8d19d2cb2e06b80b0be3819ea56e1ce2f70ab7e8bb4 |
|
MD5 | f19be30b0f69be74e6ec6c3a5cf9d042 |
|
BLAKE2b-256 | 1a19069c7f6d0799af93847d3e0220f94c34766513d4efdf761f2f0476fdcd4b |
Hashes for cchardet-2.0a4-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 883b14eac809f0e8e60106190a8beb45d15f8ab4234e5ea7e5abe2be9c4847c8 |
|
MD5 | 4a202d2d7a31e148490dddb685037b14 |
|
BLAKE2b-256 | 188d45d94ba9b06f6c09052ef35ffa1d498f94d9cdf1b9ac81dd03e3d0c96139 |
Hashes for cchardet-2.0a4-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bed20b38acc2275a2b5a7e1f065f390a56c907aa1ca6df6fb18dfaeea59455ca |
|
MD5 | 47dcea25ee6c46199fb1ae5201024b9b |
|
BLAKE2b-256 | 2800b8c503529eae456fbf7406cbf1a3288f409deda9a5f6157ede0681ee8ba0 |
Hashes for cchardet-2.0a4-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a8b2cc002362266e9e7fa2067429446443b7cfeab3fb422e26b5c20d3dd05874 |
|
MD5 | 606d67e1e4bcee8eb3ea01a0b8fb8cd9 |
|
BLAKE2b-256 | 169fe1e646b79daef68341e958e4263a2c3c0efd50f96ca2d927a02c235b4ce3 |
Hashes for cchardet-2.0a4-cp35-cp35m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f437b93a866f3d2098905d5ea9291fbab09acc6d67baf380680c07191915e4ad |
|
MD5 | f5b22b4ac07e25331852cced35a02db3 |
|
BLAKE2b-256 | be6efbf68995f8da510ea68143fdb3d8b2aa26fedfb92a18ee6a9a37f369d24f |
Hashes for cchardet-2.0a4-cp35-cp35m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 44c536efdcf6bb35dec21cc183f13615162033e88e9404d48f1178647f7532da |
|
MD5 | 3c4756d7c13ef823dbd56fbcfa94530d |
|
BLAKE2b-256 | 01861e2770527cb6e6eba452587c050ced26de27cc2af38493bdc6e520af77ca |
Hashes for cchardet-2.0a4-cp35-cp35m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5b1cc98add9f2f5ec54a4781329109f97b86d46ec7b9a106c4fa0ec733cf35c5 |
|
MD5 | e3ca784250e03d4460e9afd7804458b3 |
|
BLAKE2b-256 | 97cb4d6c4d503e2abd883c8651cf280867e76af5aad4538e73babb96da5ddf94 |
Hashes for cchardet-2.0a4-cp35-cp35m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79d38e10083a80fd8dec0a5a8268130d6cde1a94efbc38fa31341971ffae53e3 |
|
MD5 | b5c0d3437f37ecaca1ed1ec31a7bf50d |
|
BLAKE2b-256 | ad8e9c9575bd5d9fe9f217ca13f54b36e6fcf894b59b3a126e1720ad2c3ae870 |
Hashes for cchardet-2.0a4-cp34-cp34m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b6bb1d81054a05931d4a05b78ecd734e11f8bd153f4579428588931391412f2 |
|
MD5 | 79c8dd39761a18be4a360d6d1543feae |
|
BLAKE2b-256 | 04b3d081c70724419da10c651bd9b3248858510d70e2b4e6403951222585e7d3 |
Hashes for cchardet-2.0a4-cp34-cp34m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 912568b0dc8eacb7000bf5a61a47e07274a9628b0e0edf0e027324bf6571a94c |
|
MD5 | e504962fb660f35022c754b79adbe628 |
|
BLAKE2b-256 | b30f37bd425be34e8d3a9c49bd8622814d13dbcd51a8477b76c361a22fcf9995 |
Hashes for cchardet-2.0a4-cp34-cp34m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae2453e29f9a1e62f1e5da0fee46297ca8be81f11b51e49dfc5308e766016043 |
|
MD5 | 24354c2b7e54ec52288e8c676fa92a7f |
|
BLAKE2b-256 | 53ef05295f206743a1daa7ac46875fbaea4320fb795dd95a704a8e00e382c02d |
Hashes for cchardet-2.0a4-cp34-cp34m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 148d1c16bdc693b7649e42cff4af6d9e9d30636211e5018f59c1de776e57e74c |
|
MD5 | b9e7a6c520b8fae7a8c723b93095b5cc |
|
BLAKE2b-256 | 04d7df2f17093ca4fe94124e47ccc512b28634d87b09b23e58f4944e5261df55 |
Hashes for cchardet-2.0a4-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f772783885626a8e5f549c98b1a21d03d6090bc16e2c87ffe20e053bb4dcde58 |
|
MD5 | 47226a55bb2342744c3fcd1bed29e4ac |
|
BLAKE2b-256 | d16966fba4fc78d91937522679aaaf6574f7df34e98a0066e3123e37764deb79 |
Hashes for cchardet-2.0a4-cp27-cp27mu-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3dc5bb8a71d3fabcd1d2cedb6874324f348c5c0427bb284ab90558824115b378 |
|
MD5 | 24bfc81d4375ce29b2cbf90b11aec332 |
|
BLAKE2b-256 | 28e0980cb3b422e90346dc06632189c6b83795da42d948b6d35965b1080e6f45 |
Hashes for cchardet-2.0a4-cp27-cp27m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | af84d477ca4d78e05f57e24d0654052db9db143af88fa33420d2fcb791820382 |
|
MD5 | 03d096e74dbd0f79a0d2b7466cf249de |
|
BLAKE2b-256 | 850640d8678c719a207043aeec56bf8149671864ced052b02ba95d25156ef1b6 |
Hashes for cchardet-2.0a4-cp27-cp27m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | abc82b9c2fa2a1854015d160f2d890b00911c3c6f80ab1827a37a8a98b2d96ac |
|
MD5 | 0ed1684b2c1f47af2bdef48d6524590f |
|
BLAKE2b-256 | 4ef234920eb957bdf5afbb8128fa55872adfe15ca1bc68c9a65ad0f10ab4f6bd |
Hashes for cchardet-2.0a4-cp27-cp27m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a06ed7596d6905496ac01b95067b320a3ee66ea1963c8bdc033f4cf643d09aea |
|
MD5 | e6a88daeeee73559ce38192e5b735af4 |
|
BLAKE2b-256 | 9e4c87f8b7f587a65eb10c1e906c8cf64b3ab486b195138a96289f578dadbb72 |
Hashes for cchardet-2.0a4-cp27-cp27m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 61489acd53aafe952a4a0e7e8458ee496870b04fd7ddbd140d8ac169a6efe5a0 |
|
MD5 | a27d6913e42922a8a9da1c4e6953041e |
|
BLAKE2b-256 | 74e152f3d324df1de35e24793cfabd0901cd923917fd2996e0c75485f913d53a |