Skip to main content

Universal encoding detector. This library is faster than chardet.

Project description

cChardet is high speed universal character encoding detector. - binding to charsetdetect.

Support codecs

  • Big5
  • EUC-JP
  • EUC-KR
  • GB18030
  • HZ-GB-2312
  • IBM855
  • IBM866
  • ISO-2022-CN
  • ISO-2022-JP
  • ISO-2022-KR
  • ISO-8859-2
  • ISO-8859-5
  • ISO-8859-7
  • ISO-8859-8
  • KOI8-R
  • Shift_JIS
  • TIS-620
  • UTF-8
  • UTF-16BE
  • UTF-16LE
  • UTF-32BE
  • UTF-32LE
  • WINDOWS-1250
  • WINDOWS-1251
  • WINDOWS-1252
  • WINDOWS-1253
  • WINDOWS-1255
  • EUC-TW
  • X-ISO-10646-UCS-4-2143
  • X-ISO-10646-UCS-4-3412
  • x-mac-cyrillic

Requires

e.g.) Ubuntu 12.04

$ sudo apt-get install build-essential python-dev cython

Installation

$ cd /tmp
$ git clone git://github.com/PyYoshi/cChardet.git
$ cd cChardet
$ python setup.py build
$ sudo python setup.py install

or

$ sudo easy_install cchardet

Example

# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
    msg = f.read()
result = chardet.detect(msg)
print(result)

Test

$ sudo easy_install or pip install -U chardet nose
$ cd test
$ nosetests --nocapture tests.py

Benchmark

code: tests.TestCchardetSpeed

sample: test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt

Performance:

CPU: Intel Core i7 860 2.8GHz

RAM: DDR3-1333 16GB

Platform: Kubuntu 12.04 amd64, Python 2.7.3 64-bit

Result:

chardet:        0.32 (call/s)

cchardet:       975.32 (call/s)

License

  • The MIT License: src/cchardet
  • Other Libraries License: Please, look at the src/ext directory.

Contact

My blog

Issues

Sorry for my poor English :)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
cchardet-0.3.5.tar.gz (619.7 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page