Read files in all available codes in your env, so that you can pick the one that fits best!
Project description
Have you ever seen this?
UnicodeEncodeError: 'XXXXX' codec can't encode character 'XXXXX' in position 15: ordinal ...
Probably more than once, right? :) After having spent too much time on finding the right codecs for files, I wrote BruteCodecChecker. BruteCodecChecker (MIT) opens a file in all codecs available in your environment and prints the results. It also works for byte objects.
If you work, like me, with a lot of text files, it will save you a lot of time.
Install it:
pip install BruteCodecChecker
Try it:
from BruteCodecChecker import CodecChecker
teststuff = b"""This is a test!
Hi there!
A little test! """
testfilename = "test_utf8.tmp"
with open("test_utf8.tmp", mode="w", encoding="utf-8-sig") as f:
f.write(teststuff.decode("utf-8-sig"))
codechecker = CodecChecker()
codechecker.try_open_file(testfilename, readlines=2).print_results(
pause_after_interval=1, items_per_interval=10
)
codechecker.try_open_file(testfilename).print_results()
codechecker.try_convert_bytes(teststuff.decode("cp850").encode()).print_results(
pause_after_interval=1, items_per_interval=10
)
Output
Codec : palmos
Mode : strict
Length : 32
Converted :
Line: 0 This is a test!
Line: 1 Hi there!
Codec : ptcp154
Mode : strict
Length : 32
Converted :
Line: 0 п»ҝThis is a test!
Line: 1 Hi there!
Codec : punycode
Mode : strict
Codec : quopri_codec
Mode : strict
Codec : raw_unicode_escape
Mode : strict
Length : 32
Converted :
Line: 0 This is a test!
Line: 1 Hi there!
Codec : rot_13
Mode : strict
Codec : shift_jis
Mode : strict
Codec : shift_jisx0213
Mode : strict
Length : 31
Converted :
Line: 0 鬠ソThis is a test!
Line: 1 Hi there!
Codec : shift_jis_2004
Mode : strict
Length : 31
Converted :
Line: 0 鬠ソThis is a test!
Line: 1 Hi there!
Codec : tis_620
Mode : strict
Length : 32
Converted :
Line: 0 ๏ปฟThis is a test!
Line: 1 Hi there!
Codec : undefined
Mode : strict
Codec : unicode_escape
Mode : strict
Length : 32
Converted :
Line: 0 This is a test!
Line: 1 Hi there!
Codec : utf_16
Mode : strict
Codec : utf_16_be
Mode : strict
Codec : utf_16_le
Mode : strict
Codec : utf_32
Mode : strict
Codec : utf_32_be
Mode : strict
Codec : utf_32_le
Mode : strict
Codec : utf_7
Mode : strict
Codec : utf_8
Mode : strict
Length : 30
Converted :
Line: 0 This is a test!
Line: 1 Hi there!
Codec : utf_8_sig
Mode : strict
Length : 29
Converted :
Line: 0 This is a test!
Line: 1 Hi there!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file BruteCodecChecker-0.21.tar.gz
.
File metadata
- Download URL: BruteCodecChecker-0.21.tar.gz
- Upload date:
- Size: 6.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3023af65fbb433d525bcd4e5c93a87f34dafd2ffb035c7ebf9346e766baed2bc |
|
MD5 | 0a004f5cf84a3d65ecf7605e4999ac39 |
|
BLAKE2b-256 | 14e198fe9305f40b2982b8905d6c0cb9165ffe12d55e7dfbeedfa1ccb906d6e9 |
File details
Details for the file BruteCodecChecker-0.21-py3-none-any.whl
.
File metadata
- Download URL: BruteCodecChecker-0.21-py3-none-any.whl
- Upload date:
- Size: 7.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e25711794f0780c664d53d6eb619d7124d5a4ab26e4c18d999f4bbf13a447e4 |
|
MD5 | cce05643bcc9ab68a7bd18d9a242ee7a |
|
BLAKE2b-256 | ddda67bc2c7b822ec783af442dde623c6b0daae1c0099b13bba4c8167b313e82 |