Automatic locale-aware CSV and Excel reader with encoding, delimiter, date format, and number locale detection.
Project description
csvmedic
One-line CSV reading that auto-detects encoding, delimiter, dates, number formats, and booleans—so you don’t lose leading zeros or misinterpret 03/04/2025.
import csvmedic
df = csvmedic.read("export.csv")
print(df.diagnosis) # What was detected and converted
Features
| Detects | Examples |
|---|---|
| Encoding | UTF-8, Windows-1252, ISO-8859-1, Shift-JIS, BOM |
| Delimiter | Comma, semicolon, tab, pipe |
| Dates | DD-MM vs MM-DD resolved from data; ISO, European, US formats |
| Numbers | European (1.234,56) vs US (1,234.56) |
| Booleans | Yes/No, Ja/Nein, Oui/Non, Sí/No, and more |
| Strings | Preserves leading zeros (e.g. IDs 00742) |
All decisions are recorded on the returned DataFrame’s .diagnosis attribute.
Installation
pip install csvmedic
Optional extras:
csvmedic[fast]— better dialect detection (clevercsv)csvmedic[excel]— .xlsx support (openpyxl)csvmedic[all]— both
Usage
Override detection when you know better:
df = csvmedic.read(
"file.csv",
encoding="utf-8",
delimiter=";",
dayfirst=True,
preserve_strings=["ID"],
confidence_threshold=0.75,
)
Inspect without converting:
profile = csvmedic.read_raw("file.csv")
print(profile.summary())
Schema pinning (recurring files):
df = csvmedic.read("monthly_export.csv")
csvmedic.save_schema(df.attrs["diagnosis"].file_profile, "monthly_export.csvmedic.json")
# Next run: skip detection
df2 = csvmedic.read("monthly_export.csv", schema="monthly_export.csvmedic.json")
Batch with consensus:
dfs = csvmedic.read_batch(["jan.csv", "feb.csv", "mar.csv"], use_consensus=True)
Compare with pandas:
result = csvmedic.diff("file.csv")
print(result.summary())
# result.pandas_df, result.csvmedic_df, result.sample_differences
Date disambiguation
For ambiguous values like 03/04/2025, csvmedic uses the column: if any value has day > 12 (e.g. 25/03/2025), the column is treated as day-first. It also uses cross-column inference, separator hints (e.g. period → European), and order. If still ambiguous, the column stays as string and is marked in the diagnosis.
Documentation
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file csvmedic-0.1.2.tar.gz.
File metadata
- Download URL: csvmedic-0.1.2.tar.gz
- Upload date:
- Size: 21.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7962f8c4a4a71f386bbc6a19b0197c7b63c85ec76bbea4a2817394e67e9c605
|
|
| MD5 |
4b6bfa40f09dca7439a89e254a64b25b
|
|
| BLAKE2b-256 |
79480fdeae70d5b53ba01fd4007849f63b753f2409f130940df3a579d66177f3
|
File details
Details for the file csvmedic-0.1.2-py3-none-any.whl.
File metadata
- Download URL: csvmedic-0.1.2-py3-none-any.whl
- Upload date:
- Size: 29.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
817191253a07f97d0ffc6149978e1df6b4fde536384df3cb2378fc569e33c432
|
|
| MD5 |
3176ab4e22ef6cd2bd63dc6a7ab9bcea
|
|
| BLAKE2b-256 |
61504c2c7f6bbb4f239bb19b9904436a29c1d049fc5af95adbad682b6a5ffca2
|