Skip to main content

Read files in all available codes in your env, so that you can pick the one that fits best!

Project description

Have you ever seen this?

UnicodeEncodeError: 'XXXXX' codec can't encode character 'XXXXX' in position 15: ordinal ...

Probably more than once, right? :) After having spent too much time on finding the right codecs for files, I wrote BruteCodecChecker. BruteCodecChecker (MIT) opens a file in all codecs available in your environment and prints the results. It also works for byte objects.

If you work, like me, with a lot of text files, it will save you a lot of time.

Install it:

pip install BruteCodecChecker

Try it:

from BruteCodecChecker import CodecChecker

teststuff = b"""This is a test! 

Hi there!

A little test! """

testfilename = "test_utf8.tmp"

with open("test_utf8.tmp", mode="w", encoding="utf-8-sig") as f:

    f.write(teststuff.decode("utf-8-sig"))

codechecker = CodecChecker()

codechecker.try_open_file(testfilename, readlines=2).print_results(

    pause_after_interval=1, items_per_interval=10

)

codechecker.try_open_file(testfilename).print_results()

codechecker.try_convert_bytes(teststuff.decode("cp850").encode()).print_results(

    pause_after_interval=1, items_per_interval=10

)

Output


Codec               : palmos                                                       

Mode                : strict

Length              : 32

Converted           : 

Line: 0              This is a test! 

Line: 1                  Hi there!

Codec               : ptcp154                                                      

Mode                : strict

Length              : 32

Converted           : 

Line: 0              п»ҝThis is a test! 

Line: 1                  Hi there!

Codec               : punycode                                                     

Mode                : strict

Codec               : quopri_codec                                                 

Mode                : strict

Codec               : raw_unicode_escape                                           

Mode                : strict

Length              : 32

Converted           : 

Line: 0              This is a test! 

Line: 1                  Hi there!

Codec               : rot_13                                                       

Mode                : strict

Codec               : shift_jis                                                    

Mode                : strict

Codec               : shift_jisx0213                                               

Mode                : strict

Length              : 31

Converted           : 

Line: 0              鬠ソThis is a test! 

Line: 1                  Hi there!

Codec               : shift_jis_2004                                               

Mode                : strict

Length              : 31

Converted           : 

Line: 0              鬠ソThis is a test! 

Line: 1                  Hi there!

Codec               : tis_620                                                      

Mode                : strict

Length              : 32

Converted           : 

Line: 0              ๏ปฟThis is a test! 

Line: 1                  Hi there!

Codec               : undefined                                                    

Mode                : strict

Codec               : unicode_escape                                               

Mode                : strict

Length              : 32

Converted           : 

Line: 0              This is a test! 

Line: 1                  Hi there!

Codec               : utf_16                                                       

Mode                : strict

Codec               : utf_16_be                                                    

Mode                : strict

Codec               : utf_16_le                                                    

Mode                : strict

Codec               : utf_32                                                       

Mode                : strict

Codec               : utf_32_be                                                    

Mode                : strict

Codec               : utf_32_le                                                    

Mode                : strict

Codec               : utf_7                                                        

Mode                : strict

Codec               : utf_8                                                        

Mode                : strict

Length              : 30

Converted           : 

Line: 0              This is a test! 

Line: 1                  Hi there!

Codec               : utf_8_sig                                                    

Mode                : strict

Length              : 29

Converted           : 

Line: 0              This is a test! 

Line: 1                  Hi there!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

BruteCodecChecker-0.21.tar.gz (6.6 kB view hashes)

Uploaded Source

Built Distribution

BruteCodecChecker-0.21-py3-none-any.whl (7.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page