Skip to main content

Universal encoding detector. This library is faster than chardet.

Project description

cChardet is high speed universal character encoding detector. - binding to charsetdetect.

Support codecs

  • Big5

  • EUC-JP

  • EUC-KR

  • GB18030

  • HZ-GB-2312

  • IBM855

  • IBM866

  • ISO-2022-CN

  • ISO-2022-JP

  • ISO-2022-KR

  • ISO-8859-2

  • ISO-8859-5

  • ISO-8859-7

  • ISO-8859-8

  • KOI8-R

  • Shift_JIS

  • TIS-620

  • UTF-8

  • UTF-16BE

  • UTF-16LE

  • UTF-32BE

  • UTF-32LE

  • WINDOWS-1250

  • WINDOWS-1251

  • WINDOWS-1252

  • WINDOWS-1253

  • WINDOWS-1255

  • EUC-TW

  • X-ISO-10646-UCS-4-2143

  • X-ISO-10646-UCS-4-3412

  • x-mac-cyrillic

Requires

e.g.) Ubuntu 12.04

$ sudo apt-get install build-essential python-dev cython

Installation

$ cd /tmp
$ git clone git://github.com/PyYoshi/cChardet.git
$ cd cChardet
$ python setup.py build
$ sudo python setup.py install

or

$ sudo easy_install cchardet

Example

# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
    msg = f.read()
result = chardet.detect(msg)
print(result)

Test

$ sudo easy_install or pip install -U chardet nose
$ cd test
$ nosetests --nocapture tests.py

Benchmark

code: tests.TestCchardetSpeed

sample: test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt

Performance:

CPU: Intel Core i7 860 2.8GHz

RAM: DDR3-1333 16GB

Platform: Kubuntu 12.04 amd64, Python 2.7.3 64-bit

Result:

chardet:        0.32 (call/s)

cchardet:       975.32 (call/s)

License

  • The MIT License: src/cchardet

  • Other Libraries License: Please, look at the src/ext directory.

Thanks

Contact

My blog

Issues

Sorry for my poor English :)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cchardet-0.3.4.tar.gz (619.4 kB view details)

Uploaded Source

File details

Details for the file cchardet-0.3.4.tar.gz.

File metadata

  • Download URL: cchardet-0.3.4.tar.gz
  • Upload date:
  • Size: 619.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for cchardet-0.3.4.tar.gz
Algorithm Hash digest
SHA256 2a30a6be6cd9cca1c0ef4981467901ee0a9d66f516a901e385f7a59748e3b86b
MD5 5cc6f146288fa1883bb3769fe5178675
BLAKE2b-256 4d11331dcaa6171d08ce7563e3ca06f71fdb45df936c95e3a62174c2c90703c3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page