cChardet is high speed universal character encoding detector.
Project description
cChardet
NOTICE: This is a fork of the original project at https://github.com/PyYoshi/cChardet since the original project is no longer maintained.
To install:
pip install faust-cchardet
cChardet is high speed universal character encoding detector. - binding to uchardet.
Supported Languages/Encodings
International (Unicode)
UTF-8
UTF-16BE / UTF-16LE
UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431
Arabic
ISO-8859-6
WINDOWS-1256
Bulgarian
ISO-8859-5
WINDOWS-1251
Chinese
ISO-2022-CN
BIG5
EUC-TW
GB18030
HZ-GB-2312
Croatian:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Czech
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Danish
ISO-8859-1
ISO-8859-15
WINDOWS-1252
English
ASCII
Esperanto
ISO-8859-3
Estonian
ISO-8859-4
ISO-8859-13
ISO-8859-13
Windows-1252
Windows-1257
Finnish
ISO-8859-1
ISO-8859-4
ISO-8859-9
ISO-8859-13
ISO-8859-15
WINDOWS-1252
French
ISO-8859-1
ISO-8859-15
WINDOWS-1252
German
ISO-8859-1
WINDOWS-1252
Greek
ISO-8859-7
WINDOWS-1253
Hebrew
ISO-8859-8
WINDOWS-1255
Hungarian:
ISO-8859-2
WINDOWS-1250
Irish Gaelic
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Italian
ISO-8859-1
ISO-8859-3
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Japanese
ISO-2022-JP
SHIFT_JIS
EUC-JP
Korean
ISO-2022-KR
EUC-KR / UHC
Lithuanian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Latvian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Maltese
ISO-8859-3
Polish:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Portuguese
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Romanian:
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
Russian
ISO-8859-5
KOI8-R
WINDOWS-1251
MAC-CYRILLIC
IBM866
IBM855
Slovak
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Slovene
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
M
Example
# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"src/tests/samples/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
msg = f.read()
result = chardet.detect(msg)
print(result)
Benchmark
$ cd src/
$ pip install chardet
$ python tests/bench.py
Results
CPU: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz
RAM: DDR4-3200 64GB
Platform: Ubuntu 20.04 amd64
Python 3.9.0
Request (call/s) |
|
---|---|
chardet v3.0.4 |
0.46 |
cchardet v2.1.7 |
1404.05 |
LICENSE
See COPYING file.
Contact
Platform
Support
Windows i686, x86_64
Linux i686, x86_64
macOS x86_64
Do not Support
CHANGES
2.x.x
2.1.7 (2020-10-27)
support Python 3.9
drop support for Python 3.5
2.1.6 (2020-03-17)
drop support for Python 2.7
support Github Actions
update dev-dependencies
2.1.5 (2019-09-27)
update language models (uchardet)
add iso8859-2 test but disabled it
support Python 3.8
drop support for Python 3.4
2.1.4 (2018-09-27)
disable LTO because become poor performance
2.1.3 (2018-09-26)
support Python 3.7
2.1.2 (2018-09-26)
enable LTO for wheel builds
update Cython
2.1.1 (2017-07-01)
fix that different results with different chuck sizes
fix that assignments to nsSMState in nsCodingStateMachine result in unspecified behavior
include COPYING in package
2.1.0 (2017-05-15)
2.0.1 (2017-04-25)
2.0.0 (2017-04-06)
Improve tests
2.0a4 (2017-04-05)
Update uchardet repo (Fix buffer overflow)
2.0a3 (2017-03-29)
Implement UniversalDetector (like chardet)
2.0a2 (2017-03-28)
Update uchardet repo (Fix memory leak)
2.0a1 (2017-03-28)
Replace uchardet-enhanced to uchardet
Remove Detector class
1.1.3 (2017-02-26)
Support AArch64
1.1.2 (2017-01-08)
Support Python 3.6
1.1.1 (2016-11-05)
Use len() function (9e61cb9e96b138b0d18e5f9e013e144202ae4067)
Remove detect function in _cchardet.pyx (25b581294fc0ae8f686ac9972c8549666766f695)
Support manylinux1 wheel
1.1.0 (2016-10-17)
Add Detector class
Improve unit tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for faust_cchardet-2.1.14-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d51b6b45260b9b06dded87a1e715e88c2efda981ac674bb8b64458cf4f998fa1 |
|
MD5 | 01f254949c633e55b658b803f98f40d2 |
|
BLAKE2b-256 | 903260386999ddea8c5d94973979510c7d13e943f4d03a3118dd0a8822e255ed |
Hashes for faust_cchardet-2.1.14-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9e0c6bb281dd385ac36e8d97403660c6a3117d0ef636507de42d758d439b2722 |
|
MD5 | d909324fefddf93d1dee0984b342fb46 |
|
BLAKE2b-256 | 878823d7376bae76bedec498aba5509dff96de59e8e7faf5fdcc826862e85d45 |
Hashes for faust_cchardet-2.1.14-cp311-cp311-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c045c1073ea8b6212663ae42c725ed98b357a7b8f16b1ef701ebac03f4d8bca3 |
|
MD5 | 1a32cc6e4d6a4420b6fee0c173a37777 |
|
BLAKE2b-256 | 1cfad3ef6db3041549c8d552dfa8b097fb1e7f4d7ef19f82e4b67f73d9ce3a15 |
Hashes for faust_cchardet-2.1.14-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 62af1d3497737857a7d620886835da0010a15d9deca4fcad74fd7eb8dde94fb3 |
|
MD5 | 5882ce1d0c02526b04087c62651ce667 |
|
BLAKE2b-256 | 187e3dc10165f59e0906ea71dc45b29159731f074f34dc8eb59919317ea62db5 |
Hashes for faust_cchardet-2.1.14-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0564b6bfc833e201ca6b8b3908c99e2699ffccc734d1ae56a0568d8324eaa656 |
|
MD5 | 25b0ba5a49022a6e92be22e50fba234d |
|
BLAKE2b-256 | 93905849d9ae5045138044a55ef2fd94e932bbb4c54be9903c187d72d02476c6 |
Hashes for faust_cchardet-2.1.14-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3355527c2c0dcfcb7c6c13ef238b57dd9498f34db6c6e2f2e420bbe2956b65d4 |
|
MD5 | bd03545e47eaffcd8554553fc603632c |
|
BLAKE2b-256 | 7f7cbb88e66d8ac9d4400ce6db31bfb2759554456942af82f89deea85d457fc8 |
Hashes for faust_cchardet-2.1.14-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1fb7375c801b8f0c2c7a138f194bad37464f7bbf279e7f4462b68fa601a7b96 |
|
MD5 | 33d703f7719956dc32df4c14a6f97960 |
|
BLAKE2b-256 | 9f9675f3432c3c9d8f63d7d3e3874434a1c762b2755de131be1cd9b157d3f032 |
Hashes for faust_cchardet-2.1.14-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 14f637fba4dc070821eadd9a8952cb0d458f8fc7e2bbfd598cf7d2810974b638 |
|
MD5 | d5d11fb12d9a75b91b9b6b45057e88f5 |
|
BLAKE2b-256 | 9576c196480eebb3c807460a0516d7002fea6b17c180cdfe9e35bf298e0dc92c |
Hashes for faust_cchardet-2.1.14-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 361aceab8f1fafa5fa595f73c7c135b194ef4637223837e016eef03f3fea90f6 |
|
MD5 | adaab26c2fc4501ad8fc7fc8c13ba7e3 |
|
BLAKE2b-256 | 7ca975034c4f838c1b3b56df79e86e44ec9932c6c06b7afea2019d51d1cf632e |
Hashes for faust_cchardet-2.1.14-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6782af537af0670670ae84afb8b323659bf748896a23164918e64c5c57137011 |
|
MD5 | 2741e73df8c48eceeaf57f5eb6f1791d |
|
BLAKE2b-256 | 12d861784b115d41a7700fa6fdfed588110e39c79f0e99c3cb4e22e760a8b35a |
Hashes for faust_cchardet-2.1.14-cp39-cp39-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | edffe21640df7978b8a447480ca37a226b3ee8c4cd14f7bd32165c98055cb94b |
|
MD5 | 0aa7793f275a5c3f21ac77e4b2a1d5b7 |
|
BLAKE2b-256 | d2054a61a439d0160762feae1bd25b23a17bf446e9100d72f1d2a6e474e2e357 |
Hashes for faust_cchardet-2.1.14-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cfa86e10e2aeecc075c356af79e87671a96c35b4c64192ca529b6880677a52d2 |
|
MD5 | 8f8a8936dc2a6e9f3824bf99bbcea18c |
|
BLAKE2b-256 | 2a094f226e3a249496731370662e31ccc11ed4e9ad65ba949426e6f1595d2b23 |
Hashes for faust_cchardet-2.1.14-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70d9d6f5f8343b834c31709708421eadf3a3c2eee30db954c0224fcccd93cd37 |
|
MD5 | 158918e3b8beea67b201fc8c6af96833 |
|
BLAKE2b-256 | ee858e1fe603d4f8a380fd06c8f2fa9de414f5fdc1e1ab68ce83a75e92452c4a |
Hashes for faust_cchardet-2.1.14-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6dbe9d2741c3ffe8d29d12ec4a9e0d37043146c9180386b75bde05515daf6d1 |
|
MD5 | e21083a147ce66ae22a25c6257dd71f1 |
|
BLAKE2b-256 | 0a06e59ca6e77d818b147156a84d4d1dcab2e8a5fff14f4da0fdcd7099ea4d14 |
Hashes for faust_cchardet-2.1.14-cp38-cp38-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ea1b224c8b4041c2a9479d919ce8cd769c52beb68398fb036562faeb83db254c |
|
MD5 | 11c590d64b0b1ef413d382dc58eaceb3 |
|
BLAKE2b-256 | ba318f280d11a30985b15580dfdae70bbd971202525bf80c7c876dd002f9f960 |
Hashes for faust_cchardet-2.1.14-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d90e83bfdda624975df8f281a95ced513890a03d89b7bcd08bf2d581d0762f15 |
|
MD5 | 66730ec409960186b2ab05b6a0828fde |
|
BLAKE2b-256 | 009802a1b52d6a923d98c9910df287ee44d625a6ec505046b51805f45061e480 |
Hashes for faust_cchardet-2.1.14-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7c55d080f2861d5db6ea92b1b3787bb1cbe3a7161df4dc4fb5fe55bf09d59e14 |
|
MD5 | ba601b5bcc49b23855c4faa3e20985e2 |
|
BLAKE2b-256 | b88b22173b4fbfc58a10155d8849f6fddb481d3abfd360230a5f133898d0596d |
Hashes for faust_cchardet-2.1.14-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b99c17ebc75ad1498272b79de7380ff37e743ee782c3494c9ef47fa53d10a797 |
|
MD5 | 383f9830d4285f37d3b0d08f07a1a659 |
|
BLAKE2b-256 | c660b314ad815c22c391ae67000e12f65487193098fd9d919d12093619012490 |
Hashes for faust_cchardet-2.1.14-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 75a6b35cc2582e34ffa5057cdfd995920e7d306aeb27f68e740c1d4c0cb1e239 |
|
MD5 | 0f86d1c64c797a4f1a75f8e69c274593 |
|
BLAKE2b-256 | ca823dff6777ddabe9cc8c4194c27e5b98ea81e4d71d7e516cc5b424900472d2 |
Hashes for faust_cchardet-2.1.14-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aedebd05d945114f88eb4d0d0f5ba53a8082f2857d2fd02fbb02c50032abd293 |
|
MD5 | 835f103b643a3556f662ad87fdbac862 |
|
BLAKE2b-256 | 9bbcf61eff0d234a48670f6399ec34555e004723c8adf53bac7f21fa48ef7230 |
Hashes for faust_cchardet-2.1.14-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7a0dfa050eb4960e6456bf444fdc7fee1ae537d3d70219257d7cb054bd9fefa3 |
|
MD5 | 3c011c9fb4eb9da7638ddd568e8aa82b |
|
BLAKE2b-256 | 61aaf3e49075caacd52704486735b8a1a8260a55fcb8aaaed68a07a38cd6ac1d |
Hashes for faust_cchardet-2.1.14-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d04a2dfe9303f8cc3f99652d779285607d17d722494446860ae692bc2bc49408 |
|
MD5 | f0ce79aef8fe111b2d778f5d0da283e8 |
|
BLAKE2b-256 | e8f80d5d23800e5b06d292e25a4291bf26bc2d77b670a2a1ebfa232938ed24b6 |