cChardet is high speed universal character encoding detector.
Project description
cChardet
cChardet is high speed universal character encoding detector. - binding to uchardet.
Supported Languages/Encodings
International (Unicode)
UTF-8
UTF-16BE / UTF-16LE
UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431
Arabic
ISO-8859-6
WINDOWS-1256
Bulgarian
ISO-8859-5
WINDOWS-1251
Chinese
ISO-2022-CN
BIG5
EUC-TW
GB18030
HZ-GB-2312
Croatian:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Czech
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Danish
ISO-8859-1
ISO-8859-15
WINDOWS-1252
English
ASCII
Esperanto
ISO-8859-3
Estonian
ISO-8859-4
ISO-8859-13
ISO-8859-13
Windows-1252
Windows-1257
Finnish
ISO-8859-1
ISO-8859-4
ISO-8859-9
ISO-8859-13
ISO-8859-15
WINDOWS-1252
French
ISO-8859-1
ISO-8859-15
WINDOWS-1252
German
ISO-8859-1
WINDOWS-1252
Greek
ISO-8859-7
WINDOWS-1253
Hebrew
ISO-8859-8
WINDOWS-1255
Hungarian:
ISO-8859-2
WINDOWS-1250
Irish Gaelic
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Italian
ISO-8859-1
ISO-8859-3
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Japanese
ISO-2022-JP
SHIFT_JIS
EUC-JP
Korean
ISO-2022-KR
EUC-KR / UHC
Lithuanian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Latvian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Maltese
ISO-8859-3
Polish:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Portuguese
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Romanian:
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
Russian
ISO-8859-5
KOI8-R
WINDOWS-1251
MAC-CYRILLIC
IBM866
IBM855
Slovak
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Slovene
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
M
Example
# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"src/tests/samples/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
msg = f.read()
result = chardet.detect(msg)
print(result)
Benchmark
$ cd src/
$ pip install chardet
$ python tests/bench.py
Results
CPU: Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz
RAM: DDR3 1600Mhz 16GB
Platform: Ubuntu 16.04 amd64
Python 2.7.13
Request (call/s) |
|
---|---|
chardet v3.0.2 |
0.36 |
cchardet v2.0.1 |
1396.42 |
Python 3.6.1
Request (call/s) |
|
---|---|
chardet v3.0.2 |
0.35 |
cchardet v2.0.1 |
1467.77 |
LICENSE
See COPYING file.
Contact
Platform
Support
Windows i686, x86_64
Linux i686, x86_64
macOS x86_64
Do not Support
CHANGES
2.1.x
2.1.5
update language models (uchardet)
add iso8859-2 test but disabled it
support Python 3.8
drop support for Python 3.4
2.1.4 (2018-09-27)
disable LTO because become poor performance
2.1.3 (2018-09-26)
support Python 3.7
2.1.2 (2018-09-26)
enable LTO for wheel builds
update Cython
2.1.1 (2017-07-01)
fix that different results with different chuck sizes
fix that assignments to nsSMState in nsCodingStateMachine result in unspecified behavior
include COPYING in package
2.1.0 (2017-05-15)
2.0.1 (2017-04-25)
2.0.0 (2017-04-06)
Improve tests
2.0a4 (2017-04-05)
Update uchardet repo (Fix buffer overflow)
2.0a3 (2017-03-29)
Implement UniversalDetector (like chardet)
2.0a2 (2017-03-28)
Update uchardet repo (Fix memory leak)
2.0a1 (2017-03-28)
Replace uchardet-enhanced to uchardet
Remove Detector class
1.1.3 (2017-02-26)
Support AArch64
1.1.2 (2017-01-08)
Support Python 3.6
1.1.1 (2016-11-05)
Use len() function (9e61cb9e96b138b0d18e5f9e013e144202ae4067)
Remove detect function in _cchardet.pyx (25b581294fc0ae8f686ac9972c8549666766f695)
Support manylinux1 wheel
1.1.0 (2016-10-17)
Add Detector class
Improve unit tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for cchardet-2.1.5-cp38-cp38m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f7ec49fcd28088c387d4afcc02c0549434d9e07deb2519365a6baa5b6c7ebb4 |
|
MD5 | ded94401feb3556f5a1c58baa1ae49a2 |
|
BLAKE2b-256 | b9f70de674d276c833a7e1f7c79a847885afeb84c37747f49010c88ec9974fc9 |
Hashes for cchardet-2.1.5-cp38-cp38m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4001620ba761b2ddd51caef6194444b5cd2f131de7c8c51a0f4896cb1ea1111a |
|
MD5 | 4c7037dcae4f73fb408757318db94ffe |
|
BLAKE2b-256 | 49d37cac166979453dcaf41f789cb64e363591bdf366d5626c2d3df100474009 |
Hashes for cchardet-2.1.5-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5cebf47f498e5ad4a9a5ef089b7ab6ef7926eaeea0b239c8e54f8217ce81cf2 |
|
MD5 | f43bc06897a2ad87bde32ce48e49755b |
|
BLAKE2b-256 | 9745448d7e055cac8bf523a1bfb7181a4c48cc4b8326adc61bbf1b0dc6de7148 |
Hashes for cchardet-2.1.5-cp38-cp38-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 950fb40918772efe5779747a2f6c83a053a26b623a674f1d4f271b35331a9968 |
|
MD5 | 0d14a55c93f1eaa6e6d373951afa6126 |
|
BLAKE2b-256 | 7defe3afcfce735d27aa52fd52548ea65e1c19af201e034d4fdfbabf95a8b76f |
Hashes for cchardet-2.1.5-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b7cad0a062675acb42eb5170b07be774a5d9ca35a24388e918e5b78cb40ccbf2 |
|
MD5 | c7d2ff8305039e328d9c5ddfe2877278 |
|
BLAKE2b-256 | 709a36f5acd759bf95674eb125f4cb3887ce5714dee488576bd8e62808317b46 |
Hashes for cchardet-2.1.5-cp37-cp37m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8e3a50bcad2ca0921fbbd46d29cc215dcc0d6d360570d594aeb7b0e2de716e8c |
|
MD5 | ac28ead93935120bb96a881cf1933aba |
|
BLAKE2b-256 | aa243808f36f0681334459fcbd269152c487103a0c3d82d6d1adcd638847d8d9 |
Hashes for cchardet-2.1.5-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5011ab33557913489c98d2fbdd7d88f06736f0bb456c60952fc5e52886b2a410 |
|
MD5 | fd82d1c29553f1a9b2031792de37ce62 |
|
BLAKE2b-256 | c620905b6c5664736d884a40ac3b1204ab874c3c4a8ce86f7b2e28abc1fc6ee4 |
Hashes for cchardet-2.1.5-cp37-cp37m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d015296e96c0b2022495e4685b6fc0f3c9feed88fb062135f7f4748df7e0921 |
|
MD5 | 07eb369d7b5b793f9e5418222098c4e9 |
|
BLAKE2b-256 | c4e7601d858781b61e96f70fad33dbf2db8ff0343182315b3d194e21ebfaca70 |
Hashes for cchardet-2.1.5-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a192cce3009c9cd671588574ad0cb81322c78265ebcb33b2def63c15e44ea47 |
|
MD5 | 7454eeef444e00b10aff6b04dbb0beb7 |
|
BLAKE2b-256 | c8462688840b44ea32c8f04be479203a52bfb190e85d0cbce921eeb66f2a813d |
Hashes for cchardet-2.1.5-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4bd54ff3a239b4fe598ba262d8730372e339fdd314286ceb6706a003d3e03d7b |
|
MD5 | 342ac11b65ab7cb7b9fdebf625d365ba |
|
BLAKE2b-256 | 2288563f05ee807a80e95a9ba13c08390e2674318c897369a5213487097d5cef |
Hashes for cchardet-2.1.5-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 379a0bbd630bca677990df7509672a2ca43faf928939fd4b063fc2215b025b91 |
|
MD5 | cb95f34a970e10f05e499c9a1bc5298e |
|
BLAKE2b-256 | fa4e847feebfc3e71c773b23ee06c74687b8c50a5a6d6aaff452a0a4f4eb9a32 |
Hashes for cchardet-2.1.5-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a6d00b7cbd8acfc5e3093cb5f983a667d0752dc328123c8dcb293e252bfb024 |
|
MD5 | 26a6d514c7ad5347a92b4e7bee8b5923 |
|
BLAKE2b-256 | 0970014b7d10f073e58ebcb0997c31da52c260054e77b533278e8068b8002bf5 |
Hashes for cchardet-2.1.5-cp35-cp35m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 92341348fed2fb53899e9cccf030da5377beb8ed26dfddc6acf87f1f0ce4b80e |
|
MD5 | 0120338375b56d6eb92b09b6eda5d54e |
|
BLAKE2b-256 | cc91da0fdc416e67e4f699cfe65204dc5d82bfff97c1fb31bb9e7837d50463c9 |
Hashes for cchardet-2.1.5-cp35-cp35m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a4e346151042b5cfae34fff65911842f04849be4a74f22bc52b1e99c11650210 |
|
MD5 | 364dfe7906737a774a1903bde5181498 |
|
BLAKE2b-256 | d4b9a1dd62c95898a5cd6b657f049eb28fc7eed177fd8c9b9a1b34c76bda5bea |
Hashes for cchardet-2.1.5-cp35-cp35m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f22a8194c4e696cea3eff28723f77858495dec52baf93261943c8bb8ce08035 |
|
MD5 | f10fb46d4bafbed76526dd3416a02539 |
|
BLAKE2b-256 | 204c41fcbccf22bcbba5184a2a48a78d864a4f641d6c40f720109e420100c4bd |
Hashes for cchardet-2.1.5-cp35-cp35m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb05580cd40f4cb7ccda5f90163fc43e27820046a6d0af11c1747d515fc69859 |
|
MD5 | cc0ee685a1fd1b41723615052c427c22 |
|
BLAKE2b-256 | 434733e5779a6f57be6e5580e86cc38b6846e6efe0f9e7d90272fe25e4e92880 |
Hashes for cchardet-2.1.5-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3ae84e6ee215925cd06a772d87c17d5485d862e2f1677aa0d6c295ea9313f117 |
|
MD5 | 97f7d65ff3c2b6c6f2369708f99071c2 |
|
BLAKE2b-256 | 65e01d48d60aca42fcfd85873b2cd96fa9f380c462618176efa3a61773e0a99f |
Hashes for cchardet-2.1.5-cp27-cp27mu-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 30f461d876cf3ea40c6fd949b9725c7c6e2522a3e87d33817d221e9f478d7e4d |
|
MD5 | 570e465deca97900ccaf1a6f6f45a2ff |
|
BLAKE2b-256 | 76c0f51bb1a7c46e73f0257cf7c8b04620fa294f7022012e6e6b12832efcc573 |
Hashes for cchardet-2.1.5-cp27-cp27m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6bf07931fa81238d9174266aaf83605204192977671ef230d5651a8f9d4acf56 |
|
MD5 | 63b03dd38ec89cd377cf37dbeb05ec5a |
|
BLAKE2b-256 | 8c7c24da1650f950a0b1051e375aa50d270404d74c6cf4507dbc37bb70ebb94c |
Hashes for cchardet-2.1.5-cp27-cp27m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8126798ec34b9fb444472d849b6510817939347809b898a0d6d6463e41c5901a |
|
MD5 | 2b98efa8f117088dee7361a664c6d8b6 |
|
BLAKE2b-256 | 4b360862d12fd48608bf3935f681cd7de5d9868d020cef4d47c289b4713c745b |
Hashes for cchardet-2.1.5-cp27-cp27m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | af48965b752490d8e330e41a46ba47f07c63f22ac5c7f4c396b7efd3958daa2e |
|
MD5 | 5eab9400d787fdec17ccf15922b50ffa |
|
BLAKE2b-256 | 8c170bda950f2f3f221b2ca52eedcff9c183839841dd208a12c87f482457edcd |
Hashes for cchardet-2.1.5-cp27-cp27m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f87bdef26758a0a8de93bbfd7651ac4fcf798a7a06c049c347a0103279698b23 |
|
MD5 | b7ae0d7f3b490c77a98c6a65542fc5fe |
|
BLAKE2b-256 | f708380c6c5cc7507315b52d86569c83481d04763e67ecdd807cf6c5eb101011 |