Skip to main content

Read text files with unknown or mixed encodings into UTF-8.

Project description

unisafe

Build codecov FOSSA Status

A stand-in replacement for builtins.open to read unknown or mixed text file encodings into UTF-8. Optionally automatically converts UTF-8 or Windows-1252 smart quotes into UTF-8 or ASCII.

from unisafe import uread

# API is same as builtins.open, use read() for all lines
with uread('file.csv') as f:
    lines = f.read()

# Use an iterator to get each text line
with uread('file.csv') as f:
    for line in f:
        print(line)

The uread function returns a TextIOWrapper, just like Python's built-in open (when using the 'r' mode). API behavior is exactly the same as the built-in method, besides the additional runtime encoding detection and conversions. A file handle will opened in the 'rb' or read binary mode. Writing is not supported.

from unisafe import uread

f1 = open('test.txt', 'r', encoding='utf-8')
type(f1)
# -> _io.TextIOWrapper

f2 = uread('test.txt')
type(f2)
# -> _io.TextIOWrapper

Works with the csv library and third party libraries such as pandas

from unisafe import uread
import pandas as pd
import csv

with uread('file.csv') as f:
    table = csv.reader(f)
    
with uread('file.csv') as f:
    df = pd.read_csv(f, encoding='utf-8')

License

The code in this project is released under the MIT License.

FOSSA Status

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

unisafe-0.1.0-py3-none-any.whl (16.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page