cChardet is high speed universal character encoding detector.
Project description
cChardet
cChardet is high speed universal character encoding detector. - binding to uchardet.
Supported Languages/Encodings
International (Unicode)
UTF-8
UTF-16BE / UTF-16LE
UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431
Arabic
ISO-8859-6
WINDOWS-1256
Bulgarian
ISO-8859-5
WINDOWS-1251
Chinese
ISO-2022-CN
BIG5
EUC-TW
GB18030
HZ-GB-2312
Croatian:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Czech
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Danish
ISO-8859-1
ISO-8859-15
WINDOWS-1252
English
ASCII
Esperanto
ISO-8859-3
Estonian
ISO-8859-4
ISO-8859-13
ISO-8859-13
Windows-1252
Windows-1257
Finnish
ISO-8859-1
ISO-8859-4
ISO-8859-9
ISO-8859-13
ISO-8859-15
WINDOWS-1252
French
ISO-8859-1
ISO-8859-15
WINDOWS-1252
German
ISO-8859-1
WINDOWS-1252
Greek
ISO-8859-7
WINDOWS-1253
Hebrew
ISO-8859-8
WINDOWS-1255
Hungarian:
ISO-8859-2
WINDOWS-1250
Irish Gaelic
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Italian
ISO-8859-1
ISO-8859-3
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Japanese
ISO-2022-JP
SHIFT_JIS
EUC-JP
Korean
ISO-2022-KR
EUC-KR / UHC
Lithuanian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Latvian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Maltese
ISO-8859-3
Polish:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Portuguese
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Romanian:
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
Russian
ISO-8859-5
KOI8-R
WINDOWS-1251
MAC-CYRILLIC
IBM866
IBM855
Slovak
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Slovene
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
M
Example
# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"src/tests/samples/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
msg = f.read()
result = chardet.detect(msg)
print(result)
Benchmark
$ cd src/
$ pip install chardet
$ python tests/bench.py
Results
CPU: Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz
RAM: DDR3 1600Mhz 16GB
Platform: Ubuntu 16.04 amd64
Python 2.7.12
Request (call/s) |
|
---|---|
chardet |
0.26 |
cchardet |
1341.81 |
Python 3.6.0
Request (call/s) |
|
---|---|
chardet |
0.26 |
cchardet |
1472.43 |
LICENSE
See COPYING file.
Contact
CHANGES
2.0.1 (2017-04-25)
2.0.0 (2017-04-06)
Improve tests
2.0a4 (2017-04-05)
Update uchardet repo (Fix buffer overflow)
2.0a3 (2017-03-29)
Implement UniversalDetector (like chardet)
2.0a2 (2017-03-28)
Update uchardet repo (Fix memory leak)
2.0a1 (2017-03-28)
Replace uchardet-enhanced to uchardet
Remove Detector class
1.1.3 (2017-02-26)
Support AArch64
1.1.2 (2017-01-08)
Support Python 3.6
1.1.1 (2016-11-05)
Use len() function (9e61cb9e96b138b0d18e5f9e013e144202ae4067)
Remove detect function in _cchardet.pyx (25b581294fc0ae8f686ac9972c8549666766f695)
Support manylinux1 wheel
1.1.0 (2016-10-17)
Add Detector class
Improve unit tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for cchardet-2.0.1-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4090704c6f229fe52f5c089656df6e63a6e27e231fc051a1a50be56fb7c50f7f |
|
MD5 | fda21be0ae08dabf8b00b29bc1ef7976 |
|
BLAKE2b-256 | c987421a0863ec27eaa2d64c759184c72ca5d22fe4a5f8c851564773a6413ac8 |
Hashes for cchardet-2.0.1-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc6de5c77b102360bb1ed1f0e1e738a7a12037eb6f525a86ef2b0f3c3da57d9d |
|
MD5 | 0feb54c43ab5a35b3a2ec05d23c3534f |
|
BLAKE2b-256 | bcdbd545f3554ba14b2a2eb3c30647a281a8f8ea74c5e133c08fc378a772ab1f |
Hashes for cchardet-2.0.1-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3aa9a61574a205477333fe5cfa2862692bbaffdce4ffca8f19d112c6d239f10c |
|
MD5 | 391c28dd7a5d96baf2e9fccfdcb6b372 |
|
BLAKE2b-256 | 6e4f698e136eb9fc6b922254eb8de85b021562bd0c5cbd43568918b67c96cc91 |
Hashes for cchardet-2.0.1-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4cf19163d001eb8910206f511daec67e1300b0fcffdd386c213fce3cdae38bb4 |
|
MD5 | bb99b0aa8d51fb78a4d9e8208e5e4376 |
|
BLAKE2b-256 | d51e997d5d20b21bc6209556f91e36fc2ef2e942d9b12df8612d989d4702154b |
Hashes for cchardet-2.0.1-cp35-cp35m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 096ab4c0ee868249d5dcdb9c6dfa72bdfc1f393e876fdaf2ec60657c7664c908 |
|
MD5 | dda818b233ae3fc092cc10e236459ebf |
|
BLAKE2b-256 | 618baed2d4d406a55aeab0ed1a8e8c642df2788c526bccddc79915829fab2881 |
Hashes for cchardet-2.0.1-cp35-cp35m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 065a14bb8d083bfc28c33e7b87a0b79dceefa352d5c40a39cb20c9497aafe8be |
|
MD5 | dd4be4b2999c66419a3410630b5702d3 |
|
BLAKE2b-256 | 6f41103a2320ad7a0e6c795bd145d827f30f16acc088dab91d332629a71e043a |
Hashes for cchardet-2.0.1-cp35-cp35m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d0f259a3eee0e7d89da4428f3015d1fe670e9350f4a1f3bf48551fc590958a4 |
|
MD5 | cc569eaeb3902655a1494b544e95bd94 |
|
BLAKE2b-256 | 311a463e786ce9a12c326e2f2f81bb191a1827cd6b2323f56f183cc31a0a98fc |
Hashes for cchardet-2.0.1-cp35-cp35m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d6c4799927719f002c8e2d9bbf5c82625c40a06ab34d5256bca2f07292001baa |
|
MD5 | 0ca7a9166b41d82ab4b8f0a553bd4711 |
|
BLAKE2b-256 | be1053aaf3082da79dda6478b764787802250e750bb28c6e602d4d73b1cb66da |
Hashes for cchardet-2.0.1-cp34-cp34m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 81f7c3db4394e6811c01fd907a2782dc305d2bb87f864842cc49bbca804cccb8 |
|
MD5 | 78573ccf2836fcdf45fa5f320e46f4ec |
|
BLAKE2b-256 | a0ce2ae7d39e2565c6d12d335db870b81bd43e04a8ca75e72f503acd701f78f5 |
Hashes for cchardet-2.0.1-cp34-cp34m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 64aa48e6a95537115d651c7943f0a9c35011b68880fc093f09c7b5970f15d173 |
|
MD5 | 75d708650088a734be866a7de1d3a1cd |
|
BLAKE2b-256 | c6dfa65fd5416962433945f4f1e506a9cb07ca23c4624fb04af51511c012c28f |
Hashes for cchardet-2.0.1-cp34-cp34m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff9ccb38b3c0eb0ec29a4256d139fc7801620c70600515c23365d60b3d8894e0 |
|
MD5 | 0698793085197e2047431ef03ed657d9 |
|
BLAKE2b-256 | 9aca20f87ec1a2d6a987dcd436730f90786e1df7c6e675c189879a1ab0879ed8 |
Hashes for cchardet-2.0.1-cp34-cp34m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4704ac375ea879352600bff1bceafd3ad381b56bef5feef5ea0061a30fba1f13 |
|
MD5 | 55cee9488d125ed397351817cd172336 |
|
BLAKE2b-256 | 9df8e5a0e7e297370efc9ff0417150133a6c3a3a25164084e2a43e395f95f7b6 |
Hashes for cchardet-2.0.1-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2e0f36bbe98d4ec55493a1586f3327eeddb15fcbf8dfea2d701433f2ca285a35 |
|
MD5 | 945c8ca07bf3ffd36ea7d254fab3c291 |
|
BLAKE2b-256 | ebf830764da9a95b3a870bec88a467b915d3612522fb2f26ae5faa7831457de8 |
Hashes for cchardet-2.0.1-cp27-cp27mu-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 32bbf0ce47315fa4c296a6ce4ef924c825e4e109f2b6ef8df084e82613fb0f40 |
|
MD5 | e104f444b57005b0c81010dd67aa2332 |
|
BLAKE2b-256 | 5c1190789ff8688495f17a4031806df4d23004e568fc452d68bb0b542907921f |
Hashes for cchardet-2.0.1-cp27-cp27m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2295c167820a07a77ee872ace95b89b0d4b3e97e16ab646634dfa840d84ce6c4 |
|
MD5 | 5b40a00c98b4207db589ada8161b7683 |
|
BLAKE2b-256 | e64fc1c808e73f805d5fa4a5bab5ddb7c4bc6d6b9c3341862e46c964117479b4 |
Hashes for cchardet-2.0.1-cp27-cp27m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c95f77d9a2d2fad08656f35f3ffd04b7f008aaa5a15d16d36f2d297af8f69d2 |
|
MD5 | 0237e1a85fbe0fbdcb11fb1d2d03abc4 |
|
BLAKE2b-256 | 35400ea8141332bfc155be31d91b76460643f33e96f0faf3ff28548204d01a56 |
Hashes for cchardet-2.0.1-cp27-cp27m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d9c06bc139c76273967f26030675637d3acd19129622651426a082dac0fa0103 |
|
MD5 | 3c8e5b4379cef0feacbbe3739816c3ff |
|
BLAKE2b-256 | a511ce99b44a021289246a33d04fca0bf458ec2d1af7ee75200212025c40e1fc |
Hashes for cchardet-2.0.1-cp27-cp27m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d660c810e1e29c5795114b270fac5b99bf7ab4d409fa9644f862ddb49e2104dd |
|
MD5 | 4d28132c1cf3938a9a2c1da3556fcd4f |
|
BLAKE2b-256 | 7a079c0fd9f1b56419fd6cc9afae276b00e7ed167520535b8efde08989d43e07 |