Universal encoding detector for Python 2 and 3
Project description
Chardet: The Universal Character Encoding Detector
- Detects
ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants)
Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and Simplified Chinese)
EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP (Japanese)
EUC-KR, ISO-2022-KR (Korean)
KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251 (Cyrillic)
ISO-8859-5, windows-1251 (Bulgarian)
ISO-8859-1, windows-1252 (Western European languages)
ISO-8859-7, windows-1253 (Greek)
ISO-8859-8, windows-1255 (Visual and Logical Hebrew)
TIS-620 (Thai)
Requires Python 2.7 or 3.5+.
Installation
Install from PyPI:
pip install chardet
Documentation
For users, docs are now available at https://chardet.readthedocs.io/.
Command-line Tool
chardet comes with a command-line script which reports on the encodings of one or more files:
% chardetect somefile someotherfile somefile: windows-1252 with confidence 0.5 someotherfile: ascii with confidence 1.0
About
This is a continuation of Mark Pilgrim’s excellent chardet. Previously, two versions needed to be maintained: one that supported python 2.x and one that supported python 3.x. We’ve recently merged with Ian Cordasco’s charade fork, so now we have one coherent version that works for Python 2.7+ and 3.4+.
- maintainer:
Dan Blanchard
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for chardet-4.0.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f864054d66fd9118f2e67044ac8981a54775ec5b67aed0441892edb553d21da5 |
|
MD5 | 504627b9b4fcd44720d5aa1345e29cc7 |
|
BLAKE2b-256 | 19c7fa589626997dd07bd87d9269342ccb74b1720384a4d739a1872bd84fbe68 |