cChardet is high speed universal character encoding detector.
Project description
cChardet
Work In Progress Branch
cChardet is high speed universal character encoding detector. - binding to uchardet.
Supported Languages/Encodings
International (Unicode)
UTF-8
UTF-16BE / UTF-16LE
UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431
Arabic
ISO-8859-6
WINDOWS-1256
Bulgarian
ISO-8859-5
WINDOWS-1251
Chinese
ISO-2022-CN
BIG5
EUC-TW
GB18030
HZ-GB-2312
Croatian:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Czech
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Danish
ISO-8859-1
ISO-8859-15
WINDOWS-1252
English
ASCII
Esperanto
ISO-8859-3
Estonian
ISO-8859-4
ISO-8859-13
ISO-8859-13
Windows-1252
Windows-1257
Finnish
ISO-8859-1
ISO-8859-4
ISO-8859-9
ISO-8859-13
ISO-8859-15
WINDOWS-1252
French
ISO-8859-1
ISO-8859-15
WINDOWS-1252
German
ISO-8859-1
WINDOWS-1252
Greek
ISO-8859-7
WINDOWS-1253
Hebrew
ISO-8859-8
WINDOWS-1255
Hungarian:
ISO-8859-2
WINDOWS-1250
Irish Gaelic
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Italian
ISO-8859-1
ISO-8859-3
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Japanese
ISO-2022-JP
SHIFT_JIS
EUC-JP
Korean
ISO-2022-KR
EUC-KR / UHC
Lithuanian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Latvian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Maltese
ISO-8859-3
Polish:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Portuguese
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Romanian:
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
Russian
ISO-8859-5
KOI8-R
WINDOWS-1251
MAC-CYRILLIC
IBM866
IBM855
Slovak
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Slovene
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
M
Example
# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"src/tests/samples/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
msg = f.read()
result = chardet.detect(msg)
print(result)
Benchmark
$ cd src/
$ pip install chardet
$ python tests/bench.py
Results
CPU: Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz
RAM: DDR3 1600Mhz 16GB
Platform: Ubuntu 16.04 amd64
Python 2.7.12
Request (call/s) |
|
---|---|
chardet |
0.26 |
cchardet |
1341.81 |
Python 3.6.0
Request (call/s) |
|
---|---|
chardet |
0.26 |
cchardet |
1472.43 |
LICENSE
See COPYING file.
Contact
CHANGES
2.0a3 (2017-03-29)
Implement UniversalDetector (like chardet)
2.0a2 (2017-03-28)
Update uchardet repo (Fix memory leak)
2.0a1 (2017-03-28)
Replace uchardet-enhanced to uchardet
Remove Detector class
1.1.3 (2017-02-26)
Support AArch64
1.1.2 (2017-01-08)
Support Python 3.6
1.1.1 (2016-11-05)
Use len() function (9e61cb9e96b138b0d18e5f9e013e144202ae4067)
Remove detect function in _cchardet.pyx (25b581294fc0ae8f686ac9972c8549666766f695)
Support manylinux1 wheel
1.1.0 (2016-10-17)
Add Detector class
Improve unit tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for cchardet-2.0a3-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc786f31a400eaf03661bfc91f3b2dffe5beb7eef647f5c218dbfa9868a5e8bd |
|
MD5 | 6a52a2a0d0c32fc419144e7b679db364 |
|
BLAKE2b-256 | 383cf6a48a147e5ce0690096a8ac3c1c6bf6b0b31af829f04045d82b765779fa |
Hashes for cchardet-2.0a3-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd1ec4f8975279d2d1b693904a94b8dd729726c57d5a5de1053e8f6e75e65e0d |
|
MD5 | 4e37da428caa9ac65701b17f87d86da5 |
|
BLAKE2b-256 | 277e844fd8918cb56a56f113ec1746a15849cf64f9e4277589356a28d15c58b7 |
Hashes for cchardet-2.0a3-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e9dc04e5f777f8c24cf6d7cd920ff6e3ff905265e7f8e76a48f04308775e4160 |
|
MD5 | 00d076c32695022dac3c4af9a6dff00f |
|
BLAKE2b-256 | 0accefc2c353beccf6b171588d86801eebc779548e1319ed917e986dfe7e2836 |
Hashes for cchardet-2.0a3-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d49a696042e25e85dd4da17ab807bc5f4388849f9a8de4e7625e78d9824e3cea |
|
MD5 | 5aec1e756e9fa1ee88e62d16048daa38 |
|
BLAKE2b-256 | 6d79b595142460773bca207faf461560de058a929820aaab60afc46e65babb7a |
Hashes for cchardet-2.0a3-cp35-cp35m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb6fbc58cf34c2b350d18b5c39799626f5529f959d26386d5218fbc07e510453 |
|
MD5 | 8b8f24c2d7707aad1f992c69dea14b30 |
|
BLAKE2b-256 | 738371cc56f6af476ae82d48caacf857d57b14ef8543d03f03255281ab16adde |
Hashes for cchardet-2.0a3-cp35-cp35m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4fb211d71eda7ae34730bd859d524e2a20347f416907b09e4c0f7610348253cc |
|
MD5 | 17849c73e1cc1bedc7189f569055c1ff |
|
BLAKE2b-256 | ee1cf66587b276260b21a63361cfe24af016288c355e9d2e00f283dd41364f59 |
Hashes for cchardet-2.0a3-cp35-cp35m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d295636acef2ad01d2243d9db5cbd7196c8ea29c22dc4ec2e035ee452cff3b6 |
|
MD5 | 7ab6e95a4ede12bd74d773cc05ff9a01 |
|
BLAKE2b-256 | e8fbf52ee5cc800fa97347d7c69bb1f89519c6aed63c92471cecb6dd1dd68108 |
Hashes for cchardet-2.0a3-cp35-cp35m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b3edb6c97efe9390f8318ce0bc116dd37c0b4effdf009edbc6d1d76d4dbe2094 |
|
MD5 | 15e19844860679aee19ce193e4705722 |
|
BLAKE2b-256 | a0be6a4d138af2e836d10ec81a8f14d58f672600b425197948a6552bdd50c0dd |
Hashes for cchardet-2.0a3-cp34-cp34m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 17b43800a5442731e9086f08a23867051258d216e425aaa1712be2fdfef334ac |
|
MD5 | 39f6fc2adb032df9ab35fd78a9005f15 |
|
BLAKE2b-256 | cba5c349516a363e5b4ad1e12c254b6f53a7c9e2e796ec647b760c795ae53216 |
Hashes for cchardet-2.0a3-cp34-cp34m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 18cf651da1a05c4bc30b4d0a79df79ebf378ede354770bbda751dff20fde3ac2 |
|
MD5 | 42caf66c4d42399cd545e8787ac5e787 |
|
BLAKE2b-256 | f7e579e89ec25bfb366b6c0f1980260aea6a171ae6bcc7afa6ef204e35fe41f7 |
Hashes for cchardet-2.0a3-cp34-cp34m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 41c4447c164a2a74c456ab8e13bb9f8d94bef1e7804d6be894f75a7f38703080 |
|
MD5 | e685af51f946291b9fc6d64157dd2687 |
|
BLAKE2b-256 | b213bccc49e820e1283655f06889cf1eb60ea50611a61b1d485ac3d65c9798bc |
Hashes for cchardet-2.0a3-cp34-cp34m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | de3c9f373c9ba30db5884fa1b64a694b28455cefb844f8153ffafb8f8b64d6ea |
|
MD5 | c24deb671779e5f82eddb38746371b9d |
|
BLAKE2b-256 | 6d8f675a292471aed4fcd9a71abdf8dcef73d7298aa2d2377cc7b325665103d6 |
Hashes for cchardet-2.0a3-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5992b143f2407f461a7abbea9c9b44cb7e36f7e9e267ad401a4b637d5e630ed7 |
|
MD5 | c757868809f2927f29d458ef616fe871 |
|
BLAKE2b-256 | c2676f08d67a346809e647122797a7e02aa999b7d45925b9e2396bdef67404e9 |
Hashes for cchardet-2.0a3-cp27-cp27mu-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52805dc36542270cb42c62a17977448ae0fddf173cc37364a154947131533133 |
|
MD5 | 9e2e34f5600aa4850c82bb3117e995e2 |
|
BLAKE2b-256 | 473057fcabfd69ccc038375f438405345ad0902b73ba76aa5f9fbfeadccde5b0 |
Hashes for cchardet-2.0a3-cp27-cp27m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d7edb8046a0767918271b412ecb166547009f8a8764ad62facc4c392b2ff334 |
|
MD5 | 7c51e61eaa21e1a3fda8e22c4acbf236 |
|
BLAKE2b-256 | 862d81972e8e3975823e35e2e91df886dbb868c400eb419d837d7e2464f58c03 |
Hashes for cchardet-2.0a3-cp27-cp27m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b2fc980e0e7a24be21adcf0f94a759c01f0cb05ab9effee06de1ad90e8f7c93f |
|
MD5 | 07bfb91056c507603d0ca754aa199470 |
|
BLAKE2b-256 | 56d3046cc7d6568ef2cf071094c9c52181fae1df4164d0709e4248b4e14a8fd7 |
Hashes for cchardet-2.0a3-cp27-cp27m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0cb9ff0cc603aa6686f6e03d18d86c1d320d0dc9112d4cdfaa15fa92a8c5af65 |
|
MD5 | 37476757cabe9cddea34fb197c517b6b |
|
BLAKE2b-256 | 7f778699fdb54446a9b3ebd8dd7b25ffabce302448430d55e957708d1ecab4f4 |
Hashes for cchardet-2.0a3-cp27-cp27m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 158e4c64729f65335fe69dde89cc54467ee42e8a1f815ac311f1607df5e19ecb |
|
MD5 | a90b216f97b8ff5ec61858b380cb56da |
|
BLAKE2b-256 | 547379581a760e5cab04e07bc9d0fd1fcb68eb79396545bb1c149da041de97f7 |