Read text files with unknown or mixed encodings into UTF-8.
Project description
unisafe
A stand-in replacement for builtins.open
to read unknown or mixed text file encodings into UTF-8.
Optionally automatically converts UTF-8 or Windows-1252 smart quotes into UTF-8 or ASCII.
from unisafe import uread
# API is same as builtins.open, use read() for all lines
with uread('file.csv') as f:
lines = f.read()
# Use an iterator to get each text line
with uread('file.csv') as f:
for line in f:
print(line)
The uread
function returns a TextIOWrapper, just like Python's built-in open
(when using the 'r' mode). API behavior is exactly the same as the built-in method, besides the additional runtime encoding detection and conversions.
A file handle will opened in the 'rb' or read binary mode. Writing is not supported.
from unisafe import uread
f1 = open('test.txt', 'r', encoding='utf-8')
type(f1)
# -> _io.TextIOWrapper
f2 = uread('test.txt')
type(f2)
# -> _io.TextIOWrapper
Works with the csv library and third party libraries such as pandas
from unisafe import uread
import pandas as pd
import csv
with uread('file.csv') as f:
table = csv.reader(f)
with uread('file.csv') as f:
df = pd.read_csv(f, encoding='utf-8')
License
The code in this project is released under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.