Skip to main content

High-performance Rust implementation of charset-normalizer - universal character encoding detection

Project description

charset-normalizer-rs

High-performance Rust implementation of charset-normalizer - universal character encoding detection.

Overview

charset-normalizer-rs is a Rust-powered Python library that provides fast and accurate character encoding detection for text files and byte streams.

Installation

pip install charset-normalizer-rs

Features

  • 🚀 High Performance: Built with Rust for maximum speed
  • 🔍 Accurate Detection: Reliably detect character encodings
  • 🌏 Universal Support: Handles encodings from around the world
  • 🔄 Compatible: Drop-in replacement for charset-normalizer
  • 🐍 Python 3.8+: Supports Python 3.8 through 3.14
  • 🌍 Cross-Platform: Pre-built wheels for Linux, macOS, and Windows

Quick Start

from charset_normalizer_rs import from_bytes, from_path

# Detect encoding from bytes
with open('mystery_file.txt', 'rb') as f:
    raw_data = f.read()
    results = from_bytes(raw_data)
    best_match = results.best()
    print(f"Detected encoding: {best_match.encoding}")
    print(f"Decoded text: {str(best_match)}")

# Detect encoding from file path
results = from_path('mystery_file.txt')
best_match = results.best()
print(f"Encoding: {best_match.encoding}")

Common Use Cases

  • File Processing: Automatically detect and decode text files with unknown encodings
  • Web Scraping: Handle web content with various character encodings
  • Data Migration: Convert legacy data with different encodings
  • Log Analysis: Process log files from different systems and locales

Supported Encodings

Supports all major character encodings including:

  • UTF-8, UTF-16, UTF-32
  • ISO-8859 series
  • Windows code pages (cp1252, cp1251, etc.)
  • Asian encodings (GB2312, Big5, Shift-JIS, etc.)
  • And many more!

Performance

charset-normalizer-rs provides significant performance improvements over pure Python implementations, especially when processing large files or analyzing many documents.

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

charset_normalizer_rs-0.1.1-cp314-cp314-win_amd64.whl (297.1 kB view details)

Uploaded CPython 3.14Windows x86-64

charset_normalizer_rs-0.1.1-cp313-cp313-win_amd64.whl (297.1 kB view details)

Uploaded CPython 3.13Windows x86-64

charset_normalizer_rs-0.1.1-cp312-cp312-win_amd64.whl (297.7 kB view details)

Uploaded CPython 3.12Windows x86-64

charset_normalizer_rs-0.1.1-cp312-cp312-manylinux_2_34_x86_64.whl (443.1 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

charset_normalizer_rs-0.1.1-cp312-cp312-macosx_11_0_arm64.whl (399.1 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

charset_normalizer_rs-0.1.1-cp311-cp311-win_amd64.whl (298.5 kB view details)

Uploaded CPython 3.11Windows x86-64

charset_normalizer_rs-0.1.1-cp311-cp311-manylinux_2_34_x86_64.whl (443.0 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

charset_normalizer_rs-0.1.1-cp311-cp311-macosx_11_0_arm64.whl (399.5 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

charset_normalizer_rs-0.1.1-cp310-cp310-win_amd64.whl (300.9 kB view details)

Uploaded CPython 3.10Windows x86-64

charset_normalizer_rs-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl (443.0 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

charset_normalizer_rs-0.1.1-cp310-cp310-macosx_11_0_arm64.whl (399.5 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

charset_normalizer_rs-0.1.1-cp39-cp39-win_amd64.whl (303.2 kB view details)

Uploaded CPython 3.9Windows x86-64

charset_normalizer_rs-0.1.1-cp39-cp39-manylinux_2_34_x86_64.whl (443.6 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.34+ x86-64

charset_normalizer_rs-0.1.1-cp39-cp39-macosx_11_0_arm64.whl (399.8 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

charset_normalizer_rs-0.1.1-cp38-cp38-manylinux_2_34_x86_64.whl (443.5 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.34+ x86-64

charset_normalizer_rs-0.1.1-cp38-cp38-macosx_11_0_arm64.whl (399.7 kB view details)

Uploaded CPython 3.8macOS 11.0+ ARM64

File details

Details for the file charset_normalizer_rs-0.1.1-cp314-cp314-win_amd64.whl.

File metadata

File hashes

Hashes for charset_normalizer_rs-0.1.1-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 af175c3b595462b08c1def9e00550cc67e59be32821e755950cc0d24d6659633
MD5 8c206f3740f355a5355c2a6a5e64fd76
BLAKE2b-256 2256b456d6bfdb175e9f2d5497256c0f9e1ff7b0963f6691e30a1a1b0dca34d9

See more details on using hashes here.

File details

Details for the file charset_normalizer_rs-0.1.1-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for charset_normalizer_rs-0.1.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 357334bfcac9639369f72eb6704575a79f1038e7bb317ddaf31f75fb617a7eb6
MD5 f2cbb2342a54c22b5ed55cb78e734977
BLAKE2b-256 9e5fb04a0d3042a80e329b8bd2773d589b118afa0ee06922f1768d5fe0615c7a

See more details on using hashes here.

File details

Details for the file charset_normalizer_rs-0.1.1-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for charset_normalizer_rs-0.1.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 c1b02cfc51cc0bab9105044fc48af90177ac10a609ca1862434140dd7d440d68
MD5 e7e476d964821a005c4181541fd9925f
BLAKE2b-256 e957709579844da511f1ef17bc85385b912487ba400438a124b9eec6cf69541c

See more details on using hashes here.

File details

Details for the file charset_normalizer_rs-0.1.1-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for charset_normalizer_rs-0.1.1-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 4b22fe565b46d6f82401a6421c0e2b4f7dc98302f3d145c290eff489bdc81b23
MD5 ff05efa3b9a64b0b78844c99f547d2a8
BLAKE2b-256 4135ccfb09720117b85b38a7b18b0ef85c7b556ffbd584b102d70435ed344a05

See more details on using hashes here.

File details

Details for the file charset_normalizer_rs-0.1.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for charset_normalizer_rs-0.1.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9eef3c2b2cabff3c2b9c569a2154408a52069a07ace6e23154101b82e42ee192
MD5 49a77e25b7b8e6917ed3d3dc5492c39f
BLAKE2b-256 9a9c92f64eb75d2fffd6b7f1170db1c6286f018d5342a36dc7d0f938700b2531

See more details on using hashes here.

File details

Details for the file charset_normalizer_rs-0.1.1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for charset_normalizer_rs-0.1.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 ef2cc0a5f56c1a21fc6a97b009dc9f506bf523bb5543f62af7e8439c20994096
MD5 00cbbd76260a82c67c8dcf9dc2da3f09
BLAKE2b-256 314f4e66a34c87fbb04b5a5667022a945d9b449e0088cc0cdfa21450a7c5566a

See more details on using hashes here.

File details

Details for the file charset_normalizer_rs-0.1.1-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for charset_normalizer_rs-0.1.1-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 730638e8f1a38d5a5af55a65627dcf66cf06eef3faa865f7a95584598f41ab5a
MD5 b13ac7b8f33a50a916193c8af40194fe
BLAKE2b-256 8ddd8f15f34c631a24ebcda3b9148464b945a5b1e71b89a6e8b1dec604592b6d

See more details on using hashes here.

File details

Details for the file charset_normalizer_rs-0.1.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for charset_normalizer_rs-0.1.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b622a9c6fcfa54913a9041ed4b16bf6673630f1359c4e3f03c562f62d8e9234a
MD5 6d9cc5598afc00361960ef4ee2fbd159
BLAKE2b-256 067b4c97941fe3cdebc761e77c43cdb321a7de1a3081f3cfac8ca47b52a8251c

See more details on using hashes here.

File details

Details for the file charset_normalizer_rs-0.1.1-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for charset_normalizer_rs-0.1.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 4b4671bcf447d08e6b8f036d1ee595dd87d558bc08926b82dbdfc919f0a5294d
MD5 a07843849eaaa5ecbf0f443f8b4634f8
BLAKE2b-256 cb165c0323b9ff7dac56da8d3830ce05dbf525aad5a7ed781d70fadeb283b7d6

See more details on using hashes here.

File details

Details for the file charset_normalizer_rs-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for charset_normalizer_rs-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 d7b36d8112860a7ec06baa04a2f3b5814c6aa13c9e1ba392ab3a758978ab7a90
MD5 7edd15a4da14d76958dcfac5ccd7659a
BLAKE2b-256 c92447098531f53499d8c3d8b8a72f98d4e5d1f785bd1f21d48efe3f73b18b70

See more details on using hashes here.

File details

Details for the file charset_normalizer_rs-0.1.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for charset_normalizer_rs-0.1.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f1da99bdb543292c3cc0dba648e35d282235334ed917a39c4ee9ee3f7a291db7
MD5 a4e0cbd984edc649b0f71d5fd26f2315
BLAKE2b-256 468821971df8266c84faffea925eec873cf7c2eb91c517bc69adf6f2c2e6ea9a

See more details on using hashes here.

File details

Details for the file charset_normalizer_rs-0.1.1-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for charset_normalizer_rs-0.1.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 56d02a8cbfba70963da66fd787f7005fdecfcb9abc5796136bc9b48fffdc28e7
MD5 9a0bd2fef0f9e713af35aa62cecc37b4
BLAKE2b-256 93570edee0f60f69b4656e6e258c4fb53eeb4fb693b6c7c460b1235e32c1b58a

See more details on using hashes here.

File details

Details for the file charset_normalizer_rs-0.1.1-cp39-cp39-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for charset_normalizer_rs-0.1.1-cp39-cp39-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 8cd87d825f408fa504676d56bf57339b20ecf8a9116e25f4bf076552bf49ee46
MD5 44cc7c3dac0094177b3146951f29d57d
BLAKE2b-256 2e04c5ae4e38f1693d444f5270f3d763b6aa2089baba5fa09341b5c32363e8e6

See more details on using hashes here.

File details

Details for the file charset_normalizer_rs-0.1.1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for charset_normalizer_rs-0.1.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9b7dc105b3d4ad84df9ad1395722e36daae14df3f339a3a326fed39a0bdd29d2
MD5 18bfc6db8450b8c34586507c6b0a0b13
BLAKE2b-256 e936960d81afa7301b0c35a565db4fdf27c59a5730d47b94c4e8bfc9f37c8ed2

See more details on using hashes here.

File details

Details for the file charset_normalizer_rs-0.1.1-cp38-cp38-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for charset_normalizer_rs-0.1.1-cp38-cp38-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 4de9d33405df57eacd7a697ad257be4cdfea89b5f6c53fc7c02be387f91c1ca9
MD5 2b0a1414431eea7d556428a805c4a010
BLAKE2b-256 2949d1fb97f046c5fd0d315def8296386033e473af8119982f5d99729b34c265

See more details on using hashes here.

File details

Details for the file charset_normalizer_rs-0.1.1-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for charset_normalizer_rs-0.1.1-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 afa6cf84bab3f4c40c4552638508ac1e4db7a6154b96a11c54c6f20bee051a8f
MD5 8dbf496aaa10903dd0d1d4bdd36e0059
BLAKE2b-256 9cc9c23e1387d497317d4e27e39e8a83f3f366fe9795aa74e416e545c000504a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page