Read files in all available codes in your env, so that you can pick the one that fits best!
Project description
Have you ever seen this?
UnicodeEncodeError: 'XXXXX' codec can't encode character 'XXXXX' in position 15: ordinal ...
Probably more than once, right? :) After having spent too much time on finding the right codecs for files, I wrote BruteCodecChecker. BruteCodecChecker (MIT) opens a file in all codecs available in your environment and prints the results. It also works for byte objects.
If you work, like me, with a lot of text files, it will save you a lot of time.
Install it:
pip install BruteCodecChecker
Try it:
from BruteCodecChecker import CodecChecker
teststuff = b"""This is a test!
Hi there!
A little test! """
testfilename = "test_utf8.tmp"
with open("test_utf8.tmp", mode="w", encoding="utf-8-sig") as f:
f.write(teststuff.decode("utf-8-sig"))
codechecker = CodecChecker()
codechecker.try_open_file(testfilename, readlines=2).print_results(
pause_after_interval=1, items_per_interval=10
)
codechecker.try_open_file(testfilename).print_results()
codechecker.try_convert_bytes(teststuff.decode("cp850").encode()).print_results(
pause_after_interval=1, items_per_interval=10
)
Output
Codec : palmos
Mode : strict
Length : 32
Converted :
Line: 0 This is a test!
Line: 1 Hi there!
Codec : ptcp154
Mode : strict
Length : 32
Converted :
Line: 0 п»ҝThis is a test!
Line: 1 Hi there!
Codec : punycode
Mode : strict
Codec : quopri_codec
Mode : strict
Codec : raw_unicode_escape
Mode : strict
Length : 32
Converted :
Line: 0 This is a test!
Line: 1 Hi there!
Codec : rot_13
Mode : strict
Codec : shift_jis
Mode : strict
Codec : shift_jisx0213
Mode : strict
Length : 31
Converted :
Line: 0 鬠ソThis is a test!
Line: 1 Hi there!
Codec : shift_jis_2004
Mode : strict
Length : 31
Converted :
Line: 0 鬠ソThis is a test!
Line: 1 Hi there!
Codec : tis_620
Mode : strict
Length : 32
Converted :
Line: 0 ๏ปฟThis is a test!
Line: 1 Hi there!
Codec : undefined
Mode : strict
Codec : unicode_escape
Mode : strict
Length : 32
Converted :
Line: 0 This is a test!
Line: 1 Hi there!
Codec : utf_16
Mode : strict
Codec : utf_16_be
Mode : strict
Codec : utf_16_le
Mode : strict
Codec : utf_32
Mode : strict
Codec : utf_32_be
Mode : strict
Codec : utf_32_le
Mode : strict
Codec : utf_7
Mode : strict
Codec : utf_8
Mode : strict
Length : 30
Converted :
Line: 0 This is a test!
Line: 1 Hi there!
Codec : utf_8_sig
Mode : strict
Length : 29
Converted :
Line: 0 This is a test!
Line: 1 Hi there!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
BruteCodecChecker-0.21.tar.gz
(6.6 kB
view hashes)
Built Distribution
Close
Hashes for BruteCodecChecker-0.21-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e25711794f0780c664d53d6eb619d7124d5a4ab26e4c18d999f4bbf13a447e4 |
|
MD5 | cce05643bcc9ab68a7bd18d9a242ee7a |
|
BLAKE2b-256 | ddda67bc2c7b822ec783af442dde623c6b0daae1c0099b13bba4c8167b313e82 |