cChardet is high speed universal character encoding detector.
Project description
cChardet
NOTICE: This is a fork of the original project at https://github.com/PyYoshi/cChardet since the original project is no longer maintained.
To install:
pip install faust-cchardet
cChardet is high speed universal character encoding detector. - binding to uchardet.
Supported Languages/Encodings
International (Unicode)
UTF-8
UTF-16BE / UTF-16LE
UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431
Arabic
ISO-8859-6
WINDOWS-1256
Bulgarian
ISO-8859-5
WINDOWS-1251
Chinese
ISO-2022-CN
BIG5
EUC-TW
GB18030
HZ-GB-2312
Croatian:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Czech
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Danish
ISO-8859-1
ISO-8859-15
WINDOWS-1252
English
ASCII
Esperanto
ISO-8859-3
Estonian
ISO-8859-4
ISO-8859-13
ISO-8859-13
Windows-1252
Windows-1257
Finnish
ISO-8859-1
ISO-8859-4
ISO-8859-9
ISO-8859-13
ISO-8859-15
WINDOWS-1252
French
ISO-8859-1
ISO-8859-15
WINDOWS-1252
German
ISO-8859-1
WINDOWS-1252
Greek
ISO-8859-7
WINDOWS-1253
Hebrew
ISO-8859-8
WINDOWS-1255
Hungarian:
ISO-8859-2
WINDOWS-1250
Irish Gaelic
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Italian
ISO-8859-1
ISO-8859-3
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Japanese
ISO-2022-JP
SHIFT_JIS
EUC-JP
Korean
ISO-2022-KR
EUC-KR / UHC
Lithuanian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Latvian
ISO-8859-4
ISO-8859-10
ISO-8859-13
Maltese
ISO-8859-3
Polish:
ISO-8859-2
ISO-8859-13
ISO-8859-16
Windows-1250
IBM852
MAC-CENTRALEUROPE
Portuguese
ISO-8859-1
ISO-8859-9
ISO-8859-15
WINDOWS-1252
Romanian:
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
Russian
ISO-8859-5
KOI8-R
WINDOWS-1251
MAC-CYRILLIC
IBM866
IBM855
Slovak
Windows-1250
ISO-8859-2
IBM852
MAC-CENTRALEUROPE
Slovene
ISO-8859-2
ISO-8859-16
Windows-1250
IBM852
M
Example
# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"src/tests/samples/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
msg = f.read()
result = chardet.detect(msg)
print(result)
Benchmark
$ cd src/
$ pip install chardet
$ python tests/bench.py
Results
CPU: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz
RAM: DDR4-3200 64GB
Platform: Ubuntu 20.04 amd64
Python 3.9.0
Request (call/s) |
|
---|---|
chardet v3.0.4 |
0.46 |
cchardet v2.1.7 |
1404.05 |
LICENSE
See COPYING file.
Contact
Platform
Support
Windows i686, x86_64
Linux i686, x86_64
macOS x86_64
Do not Support
CHANGES
2.x.x
2.1.7 (2020-10-27)
support Python 3.9
drop support for Python 3.5
2.1.6 (2020-03-17)
drop support for Python 2.7
support Github Actions
update dev-dependencies
2.1.5 (2019-09-27)
update language models (uchardet)
add iso8859-2 test but disabled it
support Python 3.8
drop support for Python 3.4
2.1.4 (2018-09-27)
disable LTO because become poor performance
2.1.3 (2018-09-26)
support Python 3.7
2.1.2 (2018-09-26)
enable LTO for wheel builds
update Cython
2.1.1 (2017-07-01)
fix that different results with different chuck sizes
fix that assignments to nsSMState in nsCodingStateMachine result in unspecified behavior
include COPYING in package
2.1.0 (2017-05-15)
2.0.1 (2017-04-25)
2.0.0 (2017-04-06)
Improve tests
2.0a4 (2017-04-05)
Update uchardet repo (Fix buffer overflow)
2.0a3 (2017-03-29)
Implement UniversalDetector (like chardet)
2.0a2 (2017-03-28)
Update uchardet repo (Fix memory leak)
2.0a1 (2017-03-28)
Replace uchardet-enhanced to uchardet
Remove Detector class
1.1.3 (2017-02-26)
Support AArch64
1.1.2 (2017-01-08)
Support Python 3.6
1.1.1 (2016-11-05)
Use len() function (9e61cb9e96b138b0d18e5f9e013e144202ae4067)
Remove detect function in _cchardet.pyx (25b581294fc0ae8f686ac9972c8549666766f695)
Support manylinux1 wheel
1.1.0 (2016-10-17)
Add Detector class
Improve unit tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for faust_cchardet-2.1.13-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84cf2c319e3fa94a398766a8fbf7595a9aec413d2dd67895860319bbd642bf51 |
|
MD5 | 1551d9d8962e2c33381fdd9089dfceeb |
|
BLAKE2b-256 | e943171c41aa8c60b916ee011dddc639dde1e96518e5d6f3583da1f7715d275e |
Hashes for faust_cchardet-2.1.13-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ccd6dc856c66039c4bdcd3f325a49fbb7f84d7c309b5a904f53bd10437bb5e50 |
|
MD5 | f9a83610a8c0f463780c0124e87e1737 |
|
BLAKE2b-256 | 86d6e4f6df534c9ec513a1d8bc583a8de546e20b7c45bf6f1f85c20f2bb3901f |
Hashes for faust_cchardet-2.1.13-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6dac40963d1956922114fb411fb337bb7b884e1fe613ffc9ba44d22bcc9ebc9d |
|
MD5 | 121eef8412aabb68431b5f1766b7448e |
|
BLAKE2b-256 | 11fde795e89bffc76e2e4a146df32709699e67305ea59ee9897a4d6e8bf8e583 |
Hashes for faust_cchardet-2.1.13-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 900507c72ffa199068f0d1c5feac30d213d30d708be183c5ee1c9b469a823b94 |
|
MD5 | 4ac81f8219ef32750c019a1030d3476d |
|
BLAKE2b-256 | 5345c6872b018cb67da847fc07413d5b7628630822cbb9c6416251206956866d |
Hashes for faust_cchardet-2.1.13-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84c9915d7aef572357cdf7d96ae9f0f1fa42986339c7ab07595d0a5867712e1c |
|
MD5 | 10a97d66c8c33b7a7886ef4cd27aa64a |
|
BLAKE2b-256 | b9e8ff57de2ed705a8057815123387e537e39b9fc010351e5f22afac4d942a14 |
Hashes for faust_cchardet-2.1.13-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ab8843a3b91009c2220f7ae212c542c1e0e7bf7d03b1d22d533e59fc563f51a5 |
|
MD5 | 7aa0bc62b5eaa1c1e306ca1d391b0dd5 |
|
BLAKE2b-256 | 15610607228d9939b2bd2f4cee9b231be74239fc4c584efcf31b07c00782f9a9 |
Hashes for faust_cchardet-2.1.13-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dca845f632ac0231962689e4c2d1acb76ea0babeae9cd8215208a3c7ee22de67 |
|
MD5 | 522eb6551e8f45257c46486818151cf0 |
|
BLAKE2b-256 | c56faba2cbc87686efbf0ff81e8df188008fd3075cef15279d9e079b22485f59 |
Hashes for faust_cchardet-2.1.13-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 860fc99321ab29d5489279d3c7362a128e205b2c45ea7c23939f20b0d35c4996 |
|
MD5 | be92cd5803ce33876f1ef6c0fad4b49d |
|
BLAKE2b-256 | d27c4d738aef77d437289cb30cdffcacf18dd9c705f1f0c1eccd81031ce0d76b |
Hashes for faust_cchardet-2.1.13-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 965dcc318af9fd42f5c49ac269dc528e46e66e2762d2b5325538e3a50208490d |
|
MD5 | 3ee84a8fa9ba763f7833bcd409d05665 |
|
BLAKE2b-256 | 9135e93474d537087efe3da7ab06d97ec13e9ab347e91a214ae998271bdb5f14 |
Hashes for faust_cchardet-2.1.13-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae2770d4083af509912b0efd573006e7cbd79099494f276a6dc1892942ea922f |
|
MD5 | 54275ea9067ea42c6bea7c567f85b15c |
|
BLAKE2b-256 | 16b21b97cbdc5ba02d50a62505a4965057e5a8e2d240c334386884b677a3e034 |
Hashes for faust_cchardet-2.1.13-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d1ac5712578a6f5644eed8b1c6c26b7d11af46837de8b67ccb79b4a1218a846 |
|
MD5 | 26cbcb3c754559271698461e81a0dbc4 |
|
BLAKE2b-256 | 51df70ca39e47d62465a3761298f7e0a4cfa6da3d85890b6568e014fd15e0e71 |
Hashes for faust_cchardet-2.1.13-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 675b9940fcd35bc4e67dc107920123aef37a75e33ab7c7d0e6cfcd895c45c719 |
|
MD5 | 87d73a53daf011d670d0d7ac14c07dde |
|
BLAKE2b-256 | b60dc49fed9fcdeb476e44113e75d4006d3c1b9feb902278eb2b52f77c316c3f |