Skip to main content

Read text files with unknown or mixed encodings into UTF-8.

Project description

unisafe

Build codecov FOSSA Status

A stand-in replacement for builtins.open to read unknown or mixed text file encodings into UTF-8. Optionally automatically converts UTF-8 or Windows-1252 smart quotes into UTF-8 or ASCII.

from unisafe import uread

# API is same as builtins.open, use read() for all lines
with uread('file.csv') as f:
    lines = f.read()

# Use an iterator to get each text line
with uread('file.csv') as f:
    for line in f:
        print(line)

The uread function returns a TextIOWrapper, just like Python's built-in open (when using the 'r' mode). API behavior is exactly the same as the built-in method, besides the additional runtime encoding detection and conversions. A file handle will opened in the 'rb' or read binary mode. Writing is not supported.

from unisafe import uread

f1 = open('test.txt', 'r', encoding='utf-8')
type(f1)
# -> _io.TextIOWrapper

f2 = uread('test.txt')
type(f2)
# -> _io.TextIOWrapper

Works with the csv library and third party libraries such as pandas

from unisafe import uread
import pandas as pd
import csv

with uread('file.csv') as f:
    table = csv.reader(f)
    
with uread('file.csv') as f:
    df = pd.read_csv(f, encoding='utf-8')

License

The code in this project is released under the MIT License.

FOSSA Status

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

unisafe-0.1.0-py3-none-any.whl (16.9 kB view details)

Uploaded Python 3

File details

Details for the file unisafe-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: unisafe-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for unisafe-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c6863c0bb4d95d7fec9f1abeda9f04aff9b6af032552e8716a868dff335719d0
MD5 3192b2018bc3c9bf6bfec2e1d1da61de
BLAKE2b-256 885da3febd547a9b0012f16207dedd9bc2b35d558a626dae6f496923c84dbfa8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page