Universal encoding detector. This library is faster than chardet.
Project description
cChardet is high speed universal character encoding detector. - binding to charsetdetect.
Support codecs
Big5
EUC-JP
EUC-KR
GB18030
HZ-GB-2312
IBM855
IBM866
ISO-2022-CN
ISO-2022-JP
ISO-2022-KR
ISO-8859-2
ISO-8859-5
ISO-8859-7
ISO-8859-8
KOI8-R
Shift_JIS
TIS-620
UTF-8
UTF-16BE
UTF-16LE
UTF-32BE
UTF-32LE
WINDOWS-1250
WINDOWS-1251
WINDOWS-1252
WINDOWS-1253
WINDOWS-1255
EUC-TW
X-ISO-10646-UCS-4-2143
X-ISO-10646-UCS-4-3412
x-mac-cyrillic
Requires
Cython: http://www.cython.org/
e.g.) Ubuntu 12.04
$ sudo apt-get install build-essential python-dev cython
Installation
$ cd /tmp $ git clone git://github.com/PyYoshi/cChardet.git $ cd cChardet $ python setup.py build $ sudo python setup.py install
or
$ sudo easy_install cchardet
Example
# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
msg = f.read()
result = chardet.detect(msg)
print(result)
Test
$ sudo easy_install or pip install -U chardet nose $ cd test $ nosetests --nocapture tests.py
Benchmark
code: tests.TestCchardetSpeed
sample: test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt
Performance:
CPU: Intel Core i7 860 2.8GHz
RAM: DDR3-1333 16GB
Platform: Kubuntu 12.04 amd64, Python 2.7.3 64-bit
Result:
chardet: 0.32 (call/s) cchardet: 975.32 (call/s)
License
The MIT License: src/cchardet
Other Libraries License: Please, look at the src/ext directory.
Thanks
Contact
Sorry for my poor English :)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file cchardet-0.3.4.tar.gz.
File metadata
- Download URL: cchardet-0.3.4.tar.gz
- Upload date:
- Size: 619.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2a30a6be6cd9cca1c0ef4981467901ee0a9d66f516a901e385f7a59748e3b86b
|
|
| MD5 |
5cc6f146288fa1883bb3769fe5178675
|
|
| BLAKE2b-256 |
4d11331dcaa6171d08ce7563e3ca06f71fdb45df936c95e3a62174c2c90703c3
|