cChardet is high speed universal character encoding detector.
Reason this release was yanked:
Does not import
Project description
cChardet
NOTICE: This is a fork of the original project at https://github.com/PyYoshi/cChardet since the original project is no longer maintained.
To install:
pip install faust-cchardet
cChardet is high speed universal character encoding detector. - binding to uchardet.
Supported Languages/Encodings
International (Unicode)
UTF-8
UTF-16BE / UTF-16LE
UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431
Arabic
ISO-8859-6
WINDOWS-1256
Bulgarian
ISO-8859-5
WINDOWS-1251
Chinese
ISO-2022-CN
BIG5
EUC-TW
GB18030
HZ-GB-2312
Croatian:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Czech
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Danish
ISO-8859-1
ISO-8859-15
WINDOWS-1252
English
ASCII
Esperanto
ISO-8859-3
Estonian
ISO-8859-4
ISO-8859-13
ISO-8859-13
Windows-1252
Windows-1257
Finnish
ISO-8859-1
ISO-8859-4
ISO-8859-9
ISO-8859-13
ISO-8859-15
WINDOWS-1252
French
ISO-8859-1
ISO-8859-15
WINDOWS-1252
German
ISO-8859-1
WINDOWS-1252
Greek
ISO-8859-7
WINDOWS-1253
Hebrew
ISO-8859-8
WINDOWS-1255
Hungarian:
ISO-8859-2
WINDOWS-1250
Irish Gaelic
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Italian
ISO-8859-1
ISO-8859-3
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Japanese
ISO-2022-JP
SHIFT_JIS
EUC-JP
Korean
ISO-2022-KR
EUC-KR / UHC
Lithuanian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Latvian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Maltese
ISO-8859-3
Polish:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Portuguese
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Romanian:
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
Russian
ISO-8859-5
KOI8-R
WINDOWS-1251
MAC-CYRILLIC
IBM866
IBM855
Slovak
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Slovene
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
M
Example
# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"src/tests/samples/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
msg = f.read()
result = chardet.detect(msg)
print(result)
Benchmark
$ cd src/
$ pip install chardet
$ python tests/bench.py
Results
CPU: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz
RAM: DDR4-3200 64GB
Platform: Ubuntu 20.04 amd64
Python 3.9.0
Request (call/s) |
|
---|---|
chardet v3.0.4 |
0.46 |
cchardet v2.1.7 |
1404.05 |
LICENSE
See COPYING file.
Contact
Platform
Support
Windows i686, x86_64
Linux i686, x86_64
macOS x86_64
Do not Support
CHANGES
2.x.x
2.1.7 (2020-10-27)
support Python 3.9
drop support for Python 3.5
2.1.6 (2020-03-17)
drop support for Python 2.7
support Github Actions
update dev-dependencies
2.1.5 (2019-09-27)
update language models (uchardet)
add iso8859-2 test but disabled it
support Python 3.8
drop support for Python 3.4
2.1.4 (2018-09-27)
disable LTO because become poor performance
2.1.3 (2018-09-26)
support Python 3.7
2.1.2 (2018-09-26)
enable LTO for wheel builds
update Cython
2.1.1 (2017-07-01)
fix that different results with different chuck sizes
fix that assignments to nsSMState in nsCodingStateMachine result in unspecified behavior
include COPYING in package
2.1.0 (2017-05-15)
2.0.1 (2017-04-25)
2.0.0 (2017-04-06)
Improve tests
2.0a4 (2017-04-05)
Update uchardet repo (Fix buffer overflow)
2.0a3 (2017-03-29)
Implement UniversalDetector (like chardet)
2.0a2 (2017-03-28)
Update uchardet repo (Fix memory leak)
2.0a1 (2017-03-28)
Replace uchardet-enhanced to uchardet
Remove Detector class
1.1.3 (2017-02-26)
Support AArch64
1.1.2 (2017-01-08)
Support Python 3.6
1.1.1 (2016-11-05)
Use len() function (9e61cb9e96b138b0d18e5f9e013e144202ae4067)
Remove detect function in _cchardet.pyx (25b581294fc0ae8f686ac9972c8549666766f695)
Support manylinux1 wheel
1.1.0 (2016-10-17)
Add Detector class
Improve unit tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for faust_cchardet-2.1.10rc0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a3223567ca3659b2bd5af3c5efa2a4778f99012183ce65e411212f0283b3758d |
|
MD5 | 83ef3f975b973e58a4640a746225f02a |
|
BLAKE2b-256 | 6bf97d3aeee0045860a0d64b1d6f185f2f7dcf4430757b72edfcecf07ab9a931 |
Hashes for faust_cchardet-2.1.10rc0-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 046b7cfc24c6ee00c35cb8b30d7fe0b776ef883a764fe548ff5b0213caf16bce |
|
MD5 | a563c436c395c40029756d7bd90c0cff |
|
BLAKE2b-256 | eb53e289e78494c0a3734c29eeaa328c228d619c8f5cc392ea9dfefaa160e3f6 |
Hashes for faust_cchardet-2.1.10rc0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85664ce3eb05f196004733a19e85b3703eddd842b8e672d6d4692d2ec9da5312 |
|
MD5 | 04c9f86c4af9cc79e3f9fc87df7c2ad0 |
|
BLAKE2b-256 | 14842e5ed046edab7d376e4203fc409dfe56601e20dedf799d9059538ca72d1c |
Hashes for faust_cchardet-2.1.10rc0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d9f943ea1e2d822cec72ac4cd077c974dc61f62ebf0ae0a20279e2ca21983e3 |
|
MD5 | 9f7f8a45a8cb5d82eb76676aa8c41e3f |
|
BLAKE2b-256 | 15650ac35bcc05eb387d4a70308f3726f8f7e5aee48c28ed2550c912959a891d |
Hashes for faust_cchardet-2.1.10rc0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 81512166e95dd5472806fa1b26655efe4bb3535d00fccab572ab8be6fd481373 |
|
MD5 | ef0ace65d684b2721a06b3ee305d7a76 |
|
BLAKE2b-256 | e38485f7fa62afb5f78b56d1778ea5c4a0b5ea28984813790971437a2bfaf84b |
Hashes for faust_cchardet-2.1.10rc0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5bfc6524f917cec3d5cc204848297e50e862b3a5da3169c8a2e00794be27c2f1 |
|
MD5 | 914360787f4ee5d50e61ae07ee6be397 |
|
BLAKE2b-256 | a3b96bfdbf1a58055ba64a2c87573018d964ca0005395c754efdd9318abef040 |
Hashes for faust_cchardet-2.1.10rc0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2a830a6bdd60a390d837a1c6e27b89cfbccc1296e1783c5a51bac483762d72df |
|
MD5 | 1fca58f8e1df4fe913e149222bb541bb |
|
BLAKE2b-256 | 0e23cff5ca46f0a6d5caa62059015121ad59923c76fa568e99f86c431dce5312 |
Hashes for faust_cchardet-2.1.10rc0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c9700016cdaa844b5d1a64bcde66531ca26c17e1dc1fb5323bb455a85f04401 |
|
MD5 | b4d8f2095821d587a2a4b0970b663b12 |
|
BLAKE2b-256 | 7509d568969deb8e5f14497ebee5b63333dfdb749b7be40881cd87ff6282abec |
Hashes for faust_cchardet-2.1.10rc0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | afb596519538e9a6a683d751533d6b2392468713c0b369f78fe6c7fbac7d928a |
|
MD5 | 9d1c845eb8b275805fb6e3b1d64a68be |
|
BLAKE2b-256 | b97a7967170d785035e3913d9bf2508d9d22ccf5a87ad79e716683cb0ccb7950 |
Hashes for faust_cchardet-2.1.10rc0-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d92c40246f4a5d1ca81e282fb135c48ee281446859c87c062c0cc626e6b12752 |
|
MD5 | 0fad301cede117fc967b49ad741db5de |
|
BLAKE2b-256 | 8048adc6609410377e831d11f6081e5ff2a37d7d7bc25916ac7f567d39dd6dc3 |
Hashes for faust_cchardet-2.1.10rc0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 01d108d3c1f5ccac4337d095361341f9dfee09e67be804eff5dec92f279bf735 |
|
MD5 | eabbc317afffb4fa4c6788c58bf2e747 |
|
BLAKE2b-256 | 65b00603f14f696c6124079d2952e8132b8223d8b3f8dba902cb8cab8b09514e |
Hashes for faust_cchardet-2.1.10rc0-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60586c12fa0259b0a70659f366a1ad069b681e3e9af9d9b8c8ad83a58ab0d503 |
|
MD5 | 818f0b1b8b9c9f16e9bb8206da79ab13 |
|
BLAKE2b-256 | 8a163d7d6a1ed20ed6acf81cbe3b4a8df8d6b55d425b56c0fb8a58188b44a73f |