Skip to main content

Automatic locale-aware CSV and Excel reader with encoding, delimiter, date format, and number locale detection.

Project description

csvmedic

PyPI Python License

One-line CSV reading that auto-detects encoding, delimiter, dates, number formats, and booleans—so you don’t lose leading zeros or misinterpret 03/04/2025.

import csvmedic

df = csvmedic.read("export.csv")
print(df.diagnosis)  # What was detected and converted

csvmedic demo

Features

Detects Examples
Encoding UTF-8, Windows-1252, ISO-8859-1, Shift-JIS, BOM
Delimiter Comma, semicolon, tab, pipe
Dates DD-MM vs MM-DD resolved from data; ISO, European, US formats
Numbers European (1.234,56) vs US (1,234.56)
Booleans Yes/No, Ja/Nein, Oui/Non, Sí/No, and more
Strings Preserves leading zeros (e.g. IDs 00742)

All decisions are recorded on the returned DataFrame’s .diagnosis attribute.

Installation

pip install csvmedic

Optional extras:

  • csvmedic[fast] — better dialect detection (clevercsv)
  • csvmedic[excel] — .xlsx support (openpyxl)
  • csvmedic[all] — both

Usage

Override detection when you know better:

df = csvmedic.read(
    "file.csv",
    encoding="utf-8",
    delimiter=";",
    dayfirst=True,
    preserve_strings=["ID"],
    confidence_threshold=0.75,
)

Inspect without converting:

profile = csvmedic.read_raw("file.csv")
print(profile.summary())

Schema pinning (recurring files):

df = csvmedic.read("monthly_export.csv")
csvmedic.save_schema(df.attrs["diagnosis"].file_profile, "monthly_export.csvmedic.json")
# Next run: skip detection
df2 = csvmedic.read("monthly_export.csv", schema="monthly_export.csvmedic.json")

Batch with consensus:

dfs = csvmedic.read_batch(["jan.csv", "feb.csv", "mar.csv"], use_consensus=True)

Compare with pandas:

result = csvmedic.diff("file.csv")
print(result.summary())
# result.pandas_df, result.csvmedic_df, result.sample_differences

Date disambiguation

For ambiguous values like 03/04/2025, csvmedic uses the column: if any value has day > 12 (e.g. 25/03/2025), the column is treated as day-first. It also uses cross-column inference, separator hints (e.g. period → European), and order. If still ambiguous, the column stays as string and is marked in the diagnosis.

Documentation

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csvmedic-0.1.3.tar.gz (21.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

csvmedic-0.1.3-py3-none-any.whl (29.7 kB view details)

Uploaded Python 3

File details

Details for the file csvmedic-0.1.3.tar.gz.

File metadata

  • Download URL: csvmedic-0.1.3.tar.gz
  • Upload date:
  • Size: 21.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for csvmedic-0.1.3.tar.gz
Algorithm Hash digest
SHA256 138b2f641f828cdd9ab2159f9d5ebd1f1ea1d51ba635b5c073441b20dd4f3d30
MD5 54d300220aa83dc58c176a3081b32c42
BLAKE2b-256 508d9b7d491ecf2fd4fbfa6e410826aa5a5f1f2b3e21bef69b2147ab438093ae

See more details on using hashes here.

File details

Details for the file csvmedic-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: csvmedic-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 29.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for csvmedic-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 42ee57a3ce4952ceced6211365d05106d04b3914c77e134c2179508a707a88ab
MD5 f1f44f622f999c96965b881c4f31a259
BLAKE2b-256 dd8a24f89e92aeca3a463585a23e0b7ddd5ab33ebf228c97e3dc0fbb9524e594

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page