cChardet is high speed universal character encoding detector.
Project description
cChardet
cChardet is high speed universal character encoding detector. - binding to uchardet.
Supported Languages/Encodings
International (Unicode)
UTF-8
UTF-16BE / UTF-16LE
UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431
Arabic
ISO-8859-6
WINDOWS-1256
Bulgarian
ISO-8859-5
WINDOWS-1251
Chinese
ISO-2022-CN
BIG5
EUC-TW
GB18030
HZ-GB-2312
Croatian:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Czech
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Danish
ISO-8859-1
ISO-8859-15
WINDOWS-1252
English
ASCII
Esperanto
ISO-8859-3
Estonian
ISO-8859-4
ISO-8859-13
ISO-8859-13
Windows-1252
Windows-1257
Finnish
ISO-8859-1
ISO-8859-4
ISO-8859-9
ISO-8859-13
ISO-8859-15
WINDOWS-1252
French
ISO-8859-1
ISO-8859-15
WINDOWS-1252
German
ISO-8859-1
WINDOWS-1252
Greek
ISO-8859-7
WINDOWS-1253
Hebrew
ISO-8859-8
WINDOWS-1255
Hungarian:
ISO-8859-2
WINDOWS-1250
Irish Gaelic
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Italian
ISO-8859-1
ISO-8859-3
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Japanese
ISO-2022-JP
SHIFT_JIS
EUC-JP
Korean
ISO-2022-KR
EUC-KR / UHC
Lithuanian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Latvian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Maltese
ISO-8859-3
Polish:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Portuguese
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Romanian:
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
Russian
ISO-8859-5
KOI8-R
WINDOWS-1251
MAC-CYRILLIC
IBM866
IBM855
Slovak
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Slovene
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
M
Example
# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"src/tests/samples/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
msg = f.read()
result = chardet.detect(msg)
print(result)
Benchmark
$ cd src/
$ pip install chardet
$ python tests/bench.py
Results
CPU: Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz
RAM: DDR3 1600Mhz 16GB
Platform: Ubuntu 16.04 amd64
Python 2.7.13
Request (call/s) |
|
---|---|
chardet v3.0.2 |
0.36 |
cchardet v2.0.1 |
1396.42 |
Python 3.6.1
Request (call/s) |
|
---|---|
chardet v3.0.2 |
0.35 |
cchardet v2.0.1 |
1467.77 |
LICENSE
See COPYING file.
Contact
CHANGES
2.1.3
support Python 3.7
2.1.2
enable LTO for wheel builds
update Cython
2.1.1 (2017-07-01)
fix that different results with different chuck sizes
fix that assignments to nsSMState in nsCodingStateMachine result in unspecified behavior
include COPYING in package
2.1.0 (2017-05-15)
2.0.1 (2017-04-25)
2.0.0 (2017-04-06)
Improve tests
2.0a4 (2017-04-05)
Update uchardet repo (Fix buffer overflow)
2.0a3 (2017-03-29)
Implement UniversalDetector (like chardet)
2.0a2 (2017-03-28)
Update uchardet repo (Fix memory leak)
2.0a1 (2017-03-28)
Replace uchardet-enhanced to uchardet
Remove Detector class
1.1.3 (2017-02-26)
Support AArch64
1.1.2 (2017-01-08)
Support Python 3.6
1.1.1 (2016-11-05)
Use len() function (9e61cb9e96b138b0d18e5f9e013e144202ae4067)
Remove detect function in _cchardet.pyx (25b581294fc0ae8f686ac9972c8549666766f695)
Support manylinux1 wheel
1.1.0 (2016-10-17)
Add Detector class
Improve unit tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for cchardet-2.1.3-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd3067aaaf7d73a9bd8ee316523bcb40fe99ab2dfb7234cba8d31ecb59c7b039 |
|
MD5 | a09ba688a68fe0099e4746f68f0cd3cd |
|
BLAKE2b-256 | 4d4146787ccbdd110eaab665829d35f3d4beb382d2a2badb9a4dfef831876d68 |
Hashes for cchardet-2.1.3-cp37-cp37m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 17683b5e2a0a67ffaa142379eb98177aa4d93622804e6b856102440e6da1233e |
|
MD5 | 146386b1716cdcb55f9eed2f322bbce9 |
|
BLAKE2b-256 | 9af5d4ab68e4eef9d95a9c1016a617bcb6df491a0d83c70cfe2fdae4ba34e84d |
Hashes for cchardet-2.1.3-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d0f628838322ffbe6da9c7b7bc89f3860c31fd8ecfc8f7f066d133279e43309c |
|
MD5 | e64eecbeb49c54b7abe52b04ba2d99f5 |
|
BLAKE2b-256 | a2e66d848168e3d44e8ffc7dddb87f2a5a4d661c68d2fd9ff593103dd83b0425 |
Hashes for cchardet-2.1.3-cp37-cp37m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad8614d0e01b89d94878da12af1f717dba372c73167298becafbd040f72b2c20 |
|
MD5 | 86802eaba433cb6813977ad91b6c01bf |
|
BLAKE2b-256 | 9965ef8cd997afb897c51dfbcfb2268d5be3acba782e94857432cad39a7d85dd |
Hashes for cchardet-2.1.3-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d161a6f884e92a3dac6b71a0e6bbd67700bc4adddfa5f6683a8276972ac1d1b |
|
MD5 | 83ea33c1420f47c5b303608fbf317e2a |
|
BLAKE2b-256 | 68a7cbdbe1a9371dc9cad7eb08283925972afd45f95f255ee16aa0ffc62c9bfd |
Hashes for cchardet-2.1.3-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 549956a5e9caae2092818f48a3f4538d200ff79dcedd240eddb01cc7a73af0ea |
|
MD5 | e9f5bdda0b6246c15a3aedc28d0510b1 |
|
BLAKE2b-256 | ba072330cc0fa19c20a668df55544c708fa0321f497f0fcf1a51e0250aae63c8 |
Hashes for cchardet-2.1.3-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4cd6b44a366d4a6119a6100b697429821711ddeb5353463d6a1e6265dce4a413 |
|
MD5 | 962640e1a26e2ce4472823c0eb2c0be7 |
|
BLAKE2b-256 | d975070919f520a1dc2483f49089d2b334761ca577e972bfe35ef8cf2983275b |
Hashes for cchardet-2.1.3-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00b2a64cc6f22d28322edbdb432c2c329fab509ba1ccb91e617519d58563677c |
|
MD5 | 77901204e3334f836e9d439f587cc86e |
|
BLAKE2b-256 | 8877be72a7d8a3682594d8a040993ed175db1b359cc1345c8a0228f09d93a9d9 |
Hashes for cchardet-2.1.3-cp35-cp35m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60cdb75b258ae85c59d02336344039f951d4d10697bfa05b3cd29715c380327e |
|
MD5 | 1947b2f3c66d82d5d67fa38377c8a0df |
|
BLAKE2b-256 | f42cadb92726cbd36dbbe7c40a7bb8d562fb7085072fe5548aed9e2593f270bb |
Hashes for cchardet-2.1.3-cp35-cp35m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 216e78cb4c75887c6fd8bb79f98c8126dfca0c38338362fb4310b8be89e6346a |
|
MD5 | 432ee224bc374b5f2c236147ed51d97a |
|
BLAKE2b-256 | d1cbe0915924ebe24e028d69a3529c757a4be4f300abbe45f573bd51aded8e8d |
Hashes for cchardet-2.1.3-cp35-cp35m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 57856cd0e9b8586e34808038b15e819bb1ebdd310c2d068d29be79c9f0e58cf3 |
|
MD5 | 6553e63adce0528d8a393afa28836290 |
|
BLAKE2b-256 | f313da3c99a5af3a12a75831432f0ba55fe2a587cff5065af8de527a6b42a6c4 |
Hashes for cchardet-2.1.3-cp35-cp35m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac58cae7382a9c154db2cac48ef8f44c878290d391631f22a53ec2f6771bba10 |
|
MD5 | 81b5038f9b25a1db25424be92bd25af9 |
|
BLAKE2b-256 | 83164ab0123d3ebdc16da297372be82ea186ef1d4a245c6bf8f6d79488009509 |
Hashes for cchardet-2.1.3-cp34-cp34m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c13c1550a1c6b34888838c9893f0cf38decf44bd207c3f47dfa19b9be4fcf416 |
|
MD5 | 8e14b2e931f4e1e2a49f6ae84639d917 |
|
BLAKE2b-256 | a71b87ce9260c7cac024acc2ec4cd69f3af839a006a7384b2a47c92ef98f7f86 |
Hashes for cchardet-2.1.3-cp34-cp34m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 42937102fa277ccd27ddb74fc8062f9cfbf19b35f719c16c9b735421a6603914 |
|
MD5 | 657d131a7cb3efc2186d1b3558324d59 |
|
BLAKE2b-256 | 72e2323e2d40cbe2563666fe2db956e8a39f7ff2ffd815aa7fa53f265736d907 |
Hashes for cchardet-2.1.3-cp34-cp34m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52143059a6a5162a8b2e4e1680ccceefcce102d8b83f1a6267334c60e3c25019 |
|
MD5 | 7a8bcb3800a3d10208ab03c917e18910 |
|
BLAKE2b-256 | 45ac41041c2703fef3bf9f4d94899aab5f360e291c2cdb2021ca88a4db47dc48 |
Hashes for cchardet-2.1.3-cp34-cp34m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 48be2d9f0154b8cde44289a29f3e170c0226528c2c32562ccb03532c5ef79b64 |
|
MD5 | d0ab23e59df568756a51797eaf19a3f8 |
|
BLAKE2b-256 | 1d43cfc8a09d3ee3fc84a1a8b5784eae738996a35a67254d0b3b8cb4b969c25c |
Hashes for cchardet-2.1.3-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9f654a0adb4019b8c48a60c0518da0e0c3015b6fb0c1247ab67942286fd7b4d7 |
|
MD5 | 4bb7cd901f10ca4dd449427eeb7a7a5d |
|
BLAKE2b-256 | bb7ce3816ff0cc9578db34b67e37fa40fee695578c4531846fe922be493a2162 |
Hashes for cchardet-2.1.3-cp27-cp27mu-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7bccad48f8bdbd68633dbeaffbd43a1e07709be5f76e04c634f20c0c6a805c91 |
|
MD5 | 748948327f93978d9df2977cbbacfad4 |
|
BLAKE2b-256 | e372cc029a4210839be34b905821e870e8569dfa81eb68de21cc260084f4d478 |
Hashes for cchardet-2.1.3-cp27-cp27m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b4f4791e07b85a53cd0142abe6ddceb5a38d76d80f0aefa1987237832cfa35a |
|
MD5 | 7646c4cd1acf8899198e6d833e38a10f |
|
BLAKE2b-256 | 5a0e4876a1f702ea5454719cd10a99a68cc0faf0d1dc9638e1564e1f19a4371a |
Hashes for cchardet-2.1.3-cp27-cp27m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4bda2e7efe6f8c934193a86eca003690340c0e768cdf489ca491919d223a38f5 |
|
MD5 | fad886852a34a449993a4a483a2f6333 |
|
BLAKE2b-256 | 4fe89bd9bd5886a2ed0c980315c36b9f2edabe41a93e75502c2ce6606b0710ec |
Hashes for cchardet-2.1.3-cp27-cp27m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 495acfc450f2bdc3aaa1b5298ac3a5baa93c8cdb42d1a4a0059d72476a6755d4 |
|
MD5 | 8b748dd6882fb19b6d34277b6e9f5e73 |
|
BLAKE2b-256 | 2c6049dc568c7bd2437659a4193aec1ec6f071b09657774c6fad9b7e399cccda |
Hashes for cchardet-2.1.3-cp27-cp27m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a7aece42655a58a324db2d79d127a9c285c759418d4ed92005c239af8451189e |
|
MD5 | da1293ab74723e466adee63ef264b3b3 |
|
BLAKE2b-256 | 023267b44637b519a3656545d34f5463c3ec1e3c4c562edb531adc07fb4fc471 |