Skip to main content

Universal encoding detector. This library is faster than chardet.

Project description

cChardet is high speed universal character encoding detector. - binding to charsetdetect.

Support codecs

  • Big5

  • EUC-JP

  • EUC-KR

  • GB18030

  • HZ-GB-2312

  • IBM855

  • IBM866

  • ISO-2022-CN

  • ISO-2022-JP

  • ISO-2022-KR

  • ISO-8859-2

  • ISO-8859-5

  • ISO-8859-7

  • ISO-8859-8

  • KOI8-R

  • Shift_JIS

  • TIS-620

  • UTF-8

  • UTF-16BE

  • UTF-16LE

  • UTF-32BE

  • UTF-32LE

  • WINDOWS-1250

  • WINDOWS-1251

  • WINDOWS-1252

  • WINDOWS-1253

  • WINDOWS-1255

  • EUC-TW

  • X-ISO-10646-UCS-4-2143

  • X-ISO-10646-UCS-4-3412

  • x-mac-cyrillic

Requires

e.g.) Ubuntu 12.04

$ sudo apt-get install build-essential python-dev cython

Installation

$ cd /tmp
$ git clone git://github.com/PyYoshi/cChardet.git
$ cd cChardet
$ python setup.py build
$ sudo python setup.py install

or

$ sudo easy_install cchardet

Example

# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
    msg = f.read()
result = chardet.detect(msg)
print(result)

Test

$ sudo easy_install or pip install -U chardet nose
$ cd test
$ nosetests --nocapture tests.py

Benchmark

code: tests.TestCchardetSpeed

sample: test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt

Performance:

CPU: Intel Core i7 860 2.8GHz

RAM: DDR3-1333 16GB

Platform: Kubuntu 12.04 amd64, Python 2.7.3 64-bit

Result:

chardet:        0.32 (call/s)

cchardet:       975.32 (call/s)

License

  • The MIT License: src/cchardet

  • Other Libraries License: Please, look at the src/ext directory.

Thanks

Contact

My blog

Issues

Sorry for my poor English :)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cchardet-0.3.5.tar.gz (619.7 kB view details)

Uploaded Source

File details

Details for the file cchardet-0.3.5.tar.gz.

File metadata

  • Download URL: cchardet-0.3.5.tar.gz
  • Upload date:
  • Size: 619.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for cchardet-0.3.5.tar.gz
Algorithm Hash digest
SHA256 51094c573d248a4908a968e75edd05bab136f10fcc8a70b87c8243b6d45731f8
MD5 b1e73ed1e6d6ab775c95f014b127df01
BLAKE2b-256 f3dfd506b026947666b9d4e4b5b084855fa65e46e040bfb43e845243aec83c1c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page