Read text files with unknown or mixed encodings into UTF-8.
Project description
unisafe
A stand-in replacement for builtins.open
to read unknown or mixed text file encodings into UTF-8.
Optionally automatically converts UTF-8 or Windows-1252 smart quotes into UTF-8 or ASCII.
from unisafe import uread
# API is same as builtins.open, use read() for all lines
with uread('file.csv') as f:
lines = f.read()
# Use an iterator to get each text line
with uread('file.csv') as f:
for line in f:
print(line)
The uread
function returns a TextIOWrapper, just like Python's built-in open
(when using the 'r' mode). API behavior is exactly the same as the built-in method, besides the additional runtime encoding detection and conversions.
A file handle will opened in the 'rb' or read binary mode. Writing is not supported.
from unisafe import uread
f1 = open('test.txt', 'r', encoding='utf-8')
type(f1)
# -> _io.TextIOWrapper
f2 = uread('test.txt')
type(f2)
# -> _io.TextIOWrapper
Works with the csv library and third party libraries such as pandas
from unisafe import uread
import pandas as pd
import csv
with uread('file.csv') as f:
table = csv.reader(f)
with uread('file.csv') as f:
df = pd.read_csv(f, encoding='utf-8')
License
The code in this project is released under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file unisafe-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: unisafe-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6863c0bb4d95d7fec9f1abeda9f04aff9b6af032552e8716a868dff335719d0 |
|
MD5 | 3192b2018bc3c9bf6bfec2e1d1da61de |
|
BLAKE2b-256 | 885da3febd547a9b0012f16207dedd9bc2b35d558a626dae6f496923c84dbfa8 |