Skip to main content

Automatic locale-aware CSV and Excel reader with encoding, delimiter, date format, and number locale detection.

Project description

csvmedic

PyPI Python License

One-line CSV reading that auto-detects encoding, delimiter, dates, number formats, and booleans—so you don’t lose leading zeros or misinterpret 03/04/2025.

import csvmedic

df = csvmedic.read("export.csv")
print(df.diagnosis)  # What was detected and converted

csvmedic demo

Features

Detects Examples
Encoding UTF-8, Windows-1252, ISO-8859-1, Shift-JIS, BOM
Delimiter Comma, semicolon, tab, pipe
Dates DD-MM vs MM-DD resolved from data; ISO, European, US formats
Numbers European (1.234,56) vs US (1,234.56)
Booleans Yes/No, Ja/Nein, Oui/Non, Sí/No, and more
Strings Preserves leading zeros (e.g. IDs 00742)

All decisions are recorded on the returned DataFrame’s .diagnosis attribute.

Installation

pip install csvmedic

Optional extras:

  • csvmedic[fast] — better dialect detection (clevercsv)
  • csvmedic[excel] — .xlsx support (openpyxl)
  • csvmedic[all] — both

Usage

Override detection when you know better:

df = csvmedic.read(
    "file.csv",
    encoding="utf-8",
    delimiter=";",
    dayfirst=True,
    preserve_strings=["ID"],
    confidence_threshold=0.75,
)

Inspect without converting:

profile = csvmedic.read_raw("file.csv")
print(profile.summary())

Schema pinning (recurring files):

df = csvmedic.read("monthly_export.csv")
csvmedic.save_schema(df.attrs["diagnosis"].file_profile, "monthly_export.csvmedic.json")
# Next run: skip detection
df2 = csvmedic.read("monthly_export.csv", schema="monthly_export.csvmedic.json")

Batch with consensus:

dfs = csvmedic.read_batch(["jan.csv", "feb.csv", "mar.csv"], use_consensus=True)

Compare with pandas:

result = csvmedic.diff("file.csv")
print(result.summary())
# result.pandas_df, result.csvmedic_df, result.sample_differences

Date disambiguation

For ambiguous values like 03/04/2025, csvmedic uses the column: if any value has day > 12 (e.g. 25/03/2025), the column is treated as day-first. It also uses cross-column inference, separator hints (e.g. period → European), and order. If still ambiguous, the column stays as string and is marked in the diagnosis.

Documentation

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csvmedic-0.1.2.tar.gz (21.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

csvmedic-0.1.2-py3-none-any.whl (29.7 kB view details)

Uploaded Python 3

File details

Details for the file csvmedic-0.1.2.tar.gz.

File metadata

  • Download URL: csvmedic-0.1.2.tar.gz
  • Upload date:
  • Size: 21.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for csvmedic-0.1.2.tar.gz
Algorithm Hash digest
SHA256 f7962f8c4a4a71f386bbc6a19b0197c7b63c85ec76bbea4a2817394e67e9c605
MD5 4b6bfa40f09dca7439a89e254a64b25b
BLAKE2b-256 79480fdeae70d5b53ba01fd4007849f63b753f2409f130940df3a579d66177f3

See more details on using hashes here.

File details

Details for the file csvmedic-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: csvmedic-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 29.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for csvmedic-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 817191253a07f97d0ffc6149978e1df6b4fde536384df3cb2378fc569e33c432
MD5 3176ab4e22ef6cd2bd63dc6a7ab9bcea
BLAKE2b-256 61504c2c7f6bbb4f239bb19b9904436a29c1d049fc5af95adbad682b6a5ffca2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page