cChardet is high speed universal character encoding detector.
Project description
cChardet
cChardet is high speed universal character encoding detector. - binding to uchardet.
Supported Languages/Encodings
International (Unicode)
UTF-8
UTF-16BE / UTF-16LE
UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431
Arabic
ISO-8859-6
WINDOWS-1256
Bulgarian
ISO-8859-5
WINDOWS-1251
Chinese
ISO-2022-CN
BIG5
EUC-TW
GB18030
HZ-GB-2312
Croatian:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Czech
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Danish
ISO-8859-1
ISO-8859-15
WINDOWS-1252
English
ASCII
Esperanto
ISO-8859-3
Estonian
ISO-8859-4
ISO-8859-13
ISO-8859-13
Windows-1252
Windows-1257
Finnish
ISO-8859-1
ISO-8859-4
ISO-8859-9
ISO-8859-13
ISO-8859-15
WINDOWS-1252
French
ISO-8859-1
ISO-8859-15
WINDOWS-1252
German
ISO-8859-1
WINDOWS-1252
Greek
ISO-8859-7
WINDOWS-1253
Hebrew
ISO-8859-8
WINDOWS-1255
Hungarian:
ISO-8859-2
WINDOWS-1250
Irish Gaelic
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Italian
ISO-8859-1
ISO-8859-3
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Japanese
ISO-2022-JP
SHIFT_JIS
EUC-JP
Korean
ISO-2022-KR
EUC-KR / UHC
Lithuanian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Latvian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Maltese
ISO-8859-3
Polish:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Portuguese
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Romanian:
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
Russian
ISO-8859-5
KOI8-R
WINDOWS-1251
MAC-CYRILLIC
IBM866
IBM855
Slovak
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Slovene
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
M
Example
# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"src/tests/samples/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
msg = f.read()
result = chardet.detect(msg)
print(result)
Benchmark
$ cd src/
$ pip install chardet
$ python tests/bench.py
Results
CPU: Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz
RAM: DDR3 1600Mhz 16GB
Platform: Ubuntu 16.04 amd64
Python 2.7.13
Request (call/s) |
|
---|---|
chardet v3.0.2 |
0.36 |
cchardet v2.0.1 |
1396.42 |
Python 3.6.1
Request (call/s) |
|
---|---|
chardet v3.0.2 |
0.35 |
cchardet v2.0.1 |
1467.77 |
LICENSE
See COPYING file.
Contact
CHANGES
2.1.0 (2017-05-15)
2.0.1 (2017-04-25)
2.0.0 (2017-04-06)
Improve tests
2.0a4 (2017-04-05)
Update uchardet repo (Fix buffer overflow)
2.0a3 (2017-03-29)
Implement UniversalDetector (like chardet)
2.0a2 (2017-03-28)
Update uchardet repo (Fix memory leak)
2.0a1 (2017-03-28)
Replace uchardet-enhanced to uchardet
Remove Detector class
1.1.3 (2017-02-26)
Support AArch64
1.1.2 (2017-01-08)
Support Python 3.6
1.1.1 (2016-11-05)
Use len() function (9e61cb9e96b138b0d18e5f9e013e144202ae4067)
Remove detect function in _cchardet.pyx (25b581294fc0ae8f686ac9972c8549666766f695)
Support manylinux1 wheel
1.1.0 (2016-10-17)
Add Detector class
Improve unit tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for cchardet-2.1.0-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d4bf1a381f45983b34f3c13ad247e624df6470b2fc787c260fa753a9265b790 |
|
MD5 | f0062c8ba70bf66198e9dae35385227e |
|
BLAKE2b-256 | b6f90b3694a07505f69d52439111665671f3a732764d9c276b9f1d76f59a4990 |
Hashes for cchardet-2.1.0-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 427c99fcf9ab4d11ecb45fb31a53fa682f9123440206f0331342178c941938b6 |
|
MD5 | b389042538496ce70892f7ccc1b5d82a |
|
BLAKE2b-256 | 0b9a192cdc459e98a09af783813ccdaaee164e35a6280a03bb111799b8c77afb |
Hashes for cchardet-2.1.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 53d63c6c232fcbe1957b90c79edd96d91071aa3aa133df86903c0f6476de4e02 |
|
MD5 | 892d5b419c2ac25d52141d8aee98f010 |
|
BLAKE2b-256 | 57046ae5abbf9b75112918665f8577dd71e902f978e564bcbc27418c3b5012b8 |
Hashes for cchardet-2.1.0-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc2615445acca8629d2d33ffa08b720c7d6c34150769f04770006c750c89382f |
|
MD5 | 2a26183480524bca5e008c1162c1b6e6 |
|
BLAKE2b-256 | 243e1edf93410d0db5c4c8383670ea532c9bdb491df742a53a55738f41eccaae |
Hashes for cchardet-2.1.0-cp35-cp35m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b21f937bfec8128cd8280d1522b5f87ef321e3ed846609e17fc040afb9596d45 |
|
MD5 | d3955de974b738806ad91b899010ebaa |
|
BLAKE2b-256 | d81ed2342e452e3e0df9a0d4ef7e9af5836af0f491db18c721456502f5c9459a |
Hashes for cchardet-2.1.0-cp35-cp35m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 675ee8b386a18b6cb419b32a0baff1c95fd3cfd7e074669f2b8b35d8f70c8ea4 |
|
MD5 | 17a4edf0873ca19c6704b630f815225a |
|
BLAKE2b-256 | 48fb61f6047fbcb2773c0c03bf3441867e21bf9707d4a8ece4b6d2103da6685c |
Hashes for cchardet-2.1.0-cp35-cp35m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b4cc05d7258d4517f9ea224d7a19ac8a18c8609aeae47607eb797efef0f1b7e6 |
|
MD5 | 624b9d844fb302db3888a4a874d68253 |
|
BLAKE2b-256 | db2e4d6bb672c8b475b54214d1da12e30f399c728b9e6f01b266bd1631b0dcf2 |
Hashes for cchardet-2.1.0-cp35-cp35m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9cd9aa6a887ffc46cc2a120fc46f5126b0d6bba85601fa86bcfe4b33c8db5e84 |
|
MD5 | 2efc15c031119b4136ed335d50096c75 |
|
BLAKE2b-256 | b2508527b65920287e6cb197514ec1bd68ff54ba9418740e45e2942985a535e3 |
Hashes for cchardet-2.1.0-cp34-cp34m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d266fd72da5970fa4e1a37966091de0be178d4abb3152b6458a04d171204a4f7 |
|
MD5 | 69fa37a70a5ce436e20eb3ddbcb92d1b |
|
BLAKE2b-256 | 1e938ce99170c1ea1e06371577f9421465566ddea0ee9dd8accfa57f65ed6cd6 |
Hashes for cchardet-2.1.0-cp34-cp34m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8595c205919a8d6363f8d0558171c88c32bb054d15f9bcb6e9abb4fd32739e08 |
|
MD5 | d5f4c8b6fc6de4d134fd6fc5e0dd8933 |
|
BLAKE2b-256 | 169d333d2075174713333b824a4bc60f4b27a6be4acd370022812bc715aae055 |
Hashes for cchardet-2.1.0-cp34-cp34m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b644fe62355cf4384a89e86a706846f067bd855c09451a3f01e69fee0823baa6 |
|
MD5 | acb257257372f20e45dacbd312bf7b9d |
|
BLAKE2b-256 | d3b440b641e281d8640b4c584e471aaab4fde7711ba47112ce126e0880a1c6a0 |
Hashes for cchardet-2.1.0-cp34-cp34m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5b070185c706d818647fb58060ccfdea33fbb7f6c7d6650f9b070163fef3a08b |
|
MD5 | 704ee2d08ec00069260de9a11687e6e0 |
|
BLAKE2b-256 | ce4547b65a02a81a734b28dcb8feaf06c791aad879f1cc69004ed98f2e4ef41c |
Hashes for cchardet-2.1.0-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8289875be2cc7bf7c7b9727ceafef4d94348389c47730179ba8bef3c28570b83 |
|
MD5 | cfeb69b3382b6e20f7caf67ef20db673 |
|
BLAKE2b-256 | 53f58265aa2a693cd396d745670acc984388e1a60e4444961443bd610913d7d0 |
Hashes for cchardet-2.1.0-cp27-cp27mu-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 363b3336ba3dee9138a300584261c92d2f8f7dd06f6d22cd3cbf8d73ee47decc |
|
MD5 | 66094d1bbf5f758b87b01d7848f221db |
|
BLAKE2b-256 | 9697a1cdafa63e202e26a392e4dea983107638db1929946db622bd0b2bba778d |
Hashes for cchardet-2.1.0-cp27-cp27m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13e6af0c9dc877e3d26c745a22384c01f7f1a2448d6e3eb52421bc1c44b10b40 |
|
MD5 | bcc7bf4f1b2021669f2da6491c37795f |
|
BLAKE2b-256 | c7190b97edc864dff552ce280f8237d332bd824f957b4ce4d52d0ef036d6611b |
Hashes for cchardet-2.1.0-cp27-cp27m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 228854958a698e80c1b18c064c51ff6b60d4831a452e1f59a148bd07cb3a3d2b |
|
MD5 | eff7e0017ee7389de1ccbdbee466e1aa |
|
BLAKE2b-256 | 1823501ff63bd48b0aa0a755bdc98b07847a8b8f03a6b21a52377ae8111fa387 |
Hashes for cchardet-2.1.0-cp27-cp27m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bee7ed9fcd6a5155500073ed8bb40ee25c585bd979720af8c6a806971aa02789 |
|
MD5 | 9bd9dc47a62c8da1ce6bf1dbb6626f7b |
|
BLAKE2b-256 | 9290567b5af04b893849c7271aae2bcf72428708a4760f53906deff120d89d89 |
Hashes for cchardet-2.1.0-cp27-cp27m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4edbc89715c7527f02b119eaaf079e8d246434928c25097a19152fc859459eeb |
|
MD5 | 8d5d0546159ef83956086ba96f34bd89 |
|
BLAKE2b-256 | 6caa6f395bb931d550b0434b4884fb9bda7f982e3f5184c887ea5f50aafa376e |