cChardet is high speed universal character encoding detector.
Reason this release was yanked:
Does not import
Project description
cChardet
NOTICE: This is a fork of the original project at https://github.com/PyYoshi/cChardet since the original project is no longer maintained.
To install:
pip install faust-cchardet
cChardet is high speed universal character encoding detector. - binding to uchardet.
Supported Languages/Encodings
International (Unicode)
UTF-8
UTF-16BE / UTF-16LE
UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431
Arabic
ISO-8859-6
WINDOWS-1256
Bulgarian
ISO-8859-5
WINDOWS-1251
Chinese
ISO-2022-CN
BIG5
EUC-TW
GB18030
HZ-GB-2312
Croatian:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Czech
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Danish
ISO-8859-1
ISO-8859-15
WINDOWS-1252
English
ASCII
Esperanto
ISO-8859-3
Estonian
ISO-8859-4
ISO-8859-13
ISO-8859-13
Windows-1252
Windows-1257
Finnish
ISO-8859-1
ISO-8859-4
ISO-8859-9
ISO-8859-13
ISO-8859-15
WINDOWS-1252
French
ISO-8859-1
ISO-8859-15
WINDOWS-1252
German
ISO-8859-1
WINDOWS-1252
Greek
ISO-8859-7
WINDOWS-1253
Hebrew
ISO-8859-8
WINDOWS-1255
Hungarian:
ISO-8859-2
WINDOWS-1250
Irish Gaelic
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Italian
ISO-8859-1
ISO-8859-3
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Japanese
ISO-2022-JP
SHIFT_JIS
EUC-JP
Korean
ISO-2022-KR
EUC-KR / UHC
Lithuanian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Latvian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Maltese
ISO-8859-3
Polish:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Portuguese
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Romanian:
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
Russian
ISO-8859-5
KOI8-R
WINDOWS-1251
MAC-CYRILLIC
IBM866
IBM855
Slovak
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Slovene
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
M
Example
# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"src/tests/samples/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
msg = f.read()
result = chardet.detect(msg)
print(result)
Benchmark
$ cd src/
$ pip install chardet
$ python tests/bench.py
Results
CPU: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz
RAM: DDR4-3200 64GB
Platform: Ubuntu 20.04 amd64
Python 3.9.0
Request (call/s) |
|
---|---|
chardet v3.0.4 |
0.46 |
cchardet v2.1.7 |
1404.05 |
LICENSE
See COPYING file.
Contact
Platform
Support
Windows i686, x86_64
Linux i686, x86_64
macOS x86_64
Do not Support
CHANGES
2.x.x
2.1.7 (2020-10-27)
support Python 3.9
drop support for Python 3.5
2.1.6 (2020-03-17)
drop support for Python 2.7
support Github Actions
update dev-dependencies
2.1.5 (2019-09-27)
update language models (uchardet)
add iso8859-2 test but disabled it
support Python 3.8
drop support for Python 3.4
2.1.4 (2018-09-27)
disable LTO because become poor performance
2.1.3 (2018-09-26)
support Python 3.7
2.1.2 (2018-09-26)
enable LTO for wheel builds
update Cython
2.1.1 (2017-07-01)
fix that different results with different chuck sizes
fix that assignments to nsSMState in nsCodingStateMachine result in unspecified behavior
include COPYING in package
2.1.0 (2017-05-15)
2.0.1 (2017-04-25)
2.0.0 (2017-04-06)
Improve tests
2.0a4 (2017-04-05)
Update uchardet repo (Fix buffer overflow)
2.0a3 (2017-03-29)
Implement UniversalDetector (like chardet)
2.0a2 (2017-03-28)
Update uchardet repo (Fix memory leak)
2.0a1 (2017-03-28)
Replace uchardet-enhanced to uchardet
Remove Detector class
1.1.3 (2017-02-26)
Support AArch64
1.1.2 (2017-01-08)
Support Python 3.6
1.1.1 (2016-11-05)
Use len() function (9e61cb9e96b138b0d18e5f9e013e144202ae4067)
Remove detect function in _cchardet.pyx (25b581294fc0ae8f686ac9972c8549666766f695)
Support manylinux1 wheel
1.1.0 (2016-10-17)
Add Detector class
Improve unit tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for faust_cchardet-2.1.10-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1ce14e456235898fcd5ea30b88a82dea447b0cfd7a9153174ad08d96f43525ef |
|
MD5 | abe2c5e8e8e9bc1b3264efcab82eb124 |
|
BLAKE2b-256 | ad690926143dac10d2795f889940cf736b4f14bad9ece78bae6370832219bf47 |
Hashes for faust_cchardet-2.1.10-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5578096a93a67e5b32c16db91e26ad1d15f80f6a3c0cba3255309000b60c644a |
|
MD5 | 2e087aa90752959a895549240237a510 |
|
BLAKE2b-256 | ae153addf7fea8d4e8afaea0ecd9d9c20d7e565928586b1bb58d78145109c0a1 |
Hashes for faust_cchardet-2.1.10-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3705534d1feef9a6f30a16323a00b14cf4eff2c964ee2d4ea80e540bbdf698ba |
|
MD5 | b03ea8976af9a2ddaab98a1d238ba4f9 |
|
BLAKE2b-256 | 5831e1e839418cf67138ce71531bf2685fbcfa1fbac6dda1270786aaf4c078c0 |
Hashes for faust_cchardet-2.1.10-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ffecea1a26fbfee7c168e16b688dd6545314ee5e5a7b6d40f7c12bf59ad9da86 |
|
MD5 | 99a73877ecb14a808aadabad8645c458 |
|
BLAKE2b-256 | bf43226128a82c6f00f8dad7d7a56f964a4e9b1432676dd2fd3bbc970ead187b |
Hashes for faust_cchardet-2.1.10-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb3453339cf2cf21728d56772f862ff04ecb45f60d5b975548e1a5ba68b75941 |
|
MD5 | 406b20521b954e5f90f3f49051644c1c |
|
BLAKE2b-256 | faf83ec784b2bded428b745a106a237499da2f24243bd210152afb9529da6ba7 |
Hashes for faust_cchardet-2.1.10-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f2e3c7ebe851357988500b5cd292fe88bd3d104f5233a7577a5bfb9935a5c676 |
|
MD5 | f23229d317561a288b275576c1ece17d |
|
BLAKE2b-256 | d804218acf417fb02ff4b4b7444f5d26dce2cea5d7b798e9dea62e7197d508a3 |
Hashes for faust_cchardet-2.1.10-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 221871abee6e62e70d0510635981c7001d87f700cae6b63cc081f1e9c4c765f0 |
|
MD5 | b5841f1134bd264ece93100e40fb7c34 |
|
BLAKE2b-256 | 852cedce887b4a5c996b99c7ba0fc0ca0218626f1f433d97a5a1a59a627f12fc |
Hashes for faust_cchardet-2.1.10-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 75a7671e013c82d5b75d4d56940893ee751505b21d825f8250d64d6ffe20ef06 |
|
MD5 | 917321067a83cc58b8a653d7242a4214 |
|
BLAKE2b-256 | 0101a15867849a269554052227851efa846d34cfb4494e494fcacf2dcbe590a2 |
Hashes for faust_cchardet-2.1.10-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1676949888f8367805f445f5ff1cfd2f5ea3fbb555513fc51cfd604ef70a8147 |
|
MD5 | df00f23c025d3c8bf694d880ec1e23c6 |
|
BLAKE2b-256 | 4a4c75cf460105c71aecf461514ab032bfcaa5efe45ff6fc3ed26903d99229f5 |
Hashes for faust_cchardet-2.1.10-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 531cdfff636a7d1aae111ea3ad8a5cfcbb5a5741bdacb9e629c02bbe0cd9fffe |
|
MD5 | 2dc36ad939801984cafe61c589fdca29 |
|
BLAKE2b-256 | 5c634845ff711aeae079e5c69b632cc3958192a8d7693ec66797c3d9a005d564 |
Hashes for faust_cchardet-2.1.10-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc1587e5f9515960d67fa0cc4c5ec5b0c3f2a2a80aa92f2fbd706a75c6fb4c1f |
|
MD5 | 76e9dc287050e87e0ce74abb3ed20e0d |
|
BLAKE2b-256 | 1f9b7041fb1e3e94d092bc3658874a6d70c0403a850ec9ee5ec005a68b6f0ebf |
Hashes for faust_cchardet-2.1.10-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5432430972a39c6e35576caa9fbcb4e0544ce3372ecdc108c65d827e580cc537 |
|
MD5 | 5a3e6e16a031bb8239d63e2468c84df6 |
|
BLAKE2b-256 | 87806a7eacc155760f6cab1ef215055d44f1a503232bc5108826713830f11a73 |