Detect, profile, normalize and repair delimiter-separated values files (CSV, TSV, pipe, semicolon).

These details have not been verified by PyPI

Project links

Project description

dsvmonkey

Detect, profile, normalize and repair delimiter-separated-values files.

CSV is a polite lie. Real files are tab-separated, pipe-separated, or semicolon-separated; start with decorative title rows; carry BOMs and mixed encodings; include ragged rows and quoted newlines. dsvmonkey reads them anyway, tells you what it found, and hands you a clean stream of rows.

Status

Alpha. API is not yet stable.

Install

pip install dsvmonkey

For development (editable install with test tooling):

pip install -e .[dev]
# or equivalently:
pip install -r requirements-dev.txt

Both requirements.txt and requirements-dev.txt are thin pointers to pyproject.toml — the single source of truth for dependency lists. Edit dependencies in pyproject.toml; the requirements files need no maintenance.

What it does

Detect encoding, delimiter, quote char, header row and line endings — each with a confidence score, runner-up alternatives and the reasoning behind the choice.
Normalize cells on read using cleanmonkey (BOMs, NBSPs, zero-width spaces, smart quotes, stray control chars).
Profile date columns via datemonkey.
Repair ragged rows, stray BOMs and inconsistent line endings.
Stream row-by-row; large files are fine.
Chain cleanly into pgmonkey (DB import), xlfilldown (Excel output) and typemonkey (type inference).

CLI

dsvmonkey inspect   file.csv                       # human-readable detection report
dsvmonkey normalize file.csv -o clean.csv          # strip BOM, fix ragged rows, normalize endings
dsvmonkey convert   file.csv -o out.jsonl --to jsonl

Run dsvmonkey --help or dsvmonkey <command> --help for the full list. Flags are command-specific:

inspect: -v/--verbose, --no-columns, --sample-rows, --excel-serial-min, --no-deep-scan, --clean-sample, --strict (exit 3 instead of 0 when the profile recommends human review — the unattended-pipeline gate).
normalize: --encoding, --line-ending lf|crlf|cr, --delimiter, --field-count, --no-clean, --no-deep-scan, --keep-empty-rows, --sanitize-formulas, --strict (same gate semantics as inspect --strict: profile first, exit 3 with no output written when detection isn't confident enough).
convert: --to {csv,tsv,jsonl}, --no-clean, --no-deep-scan, --keep-empty-rows, --sanitize-formulas (applies on every output format, including jsonl — JSONL output is commonly transformed back to CSV/Excel later, where formula payloads surviving as JSON string values become live formulas), --strict (gate as above).

Python API

import dsvmonkey

# Profile a file — encoding, delimiter, headers, etc.
profile = dsvmonkey.profile_file("file.csv")

# Stream cleaned rows as dicts
for row in dsvmonkey.read("file.csv"):
    ...

# Write a cleaned version
report = dsvmonkey.repair("messy.csv", "clean.csv")

# Convert to JSON Lines
dsvmonkey.to_jsonl("file.csv", "file.jsonl")

# Per-column profiling (date-format detection via datemonkey)
columns = dsvmonkey.profile_columns("file.csv")

Limitations

Some behaviours are deliberate design tradeoffs rather than bugs (e.g. mixed-encoding detection requires UTF-8 multi-byte evidence to avoid false-positives on cp1252 files; duplicate header names in dict mode warn-and-collapse rather than raise). See LIMITATIONS.md for the full list with rationale and escape hatches.

Using with AI assistants

SKILL.md at the repo root is a drop-in Claude Code / agent skill that teaches LLMs how to call dsvmonkey correctly — decision tree, failure modes it already handles, worked examples, and a "don't" list so agents stop reinventing broken CSV parsing. Copy it to ~/.claude/skills/ or include it in a project's AGENTS.md / CLAUDE.md for automatic discovery.

License

MIT. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dsvmonkey-0.1.0.tar.gz (144.9 kB view details)

Uploaded May 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dsvmonkey-0.1.0-py3-none-any.whl (75.8 kB view details)

Uploaded May 1, 2026 Python 3

File details

Details for the file dsvmonkey-0.1.0.tar.gz.

File metadata

Download URL: dsvmonkey-0.1.0.tar.gz
Upload date: May 1, 2026
Size: 144.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for dsvmonkey-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a4191d5c0b7be44b9e2738112e17368c7dd257e87645b4d674bc40dfd4ada445`
MD5	`94f6cdc941718f40a848a66eef52b26e`
BLAKE2b-256	`ecbeb8eba82f12e3ee5c8ee37227b92ead786ed3b74cb8b5e053aff78f592361`

See more details on using hashes here.

File details

Details for the file dsvmonkey-0.1.0-py3-none-any.whl.

File metadata

Download URL: dsvmonkey-0.1.0-py3-none-any.whl
Upload date: May 1, 2026
Size: 75.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for dsvmonkey-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7b1900e786b71b015ebff26a58738c988b1ca9ddebb7f985cf4292c1a13ce605`
MD5	`7327a3adb83af713210460f1bfcf07e3`
BLAKE2b-256	`b5774e51f4cc8fd2902efbf836d178ca0c15503d03ee49bf674c84d065ebb828`

See more details on using hashes here.

dsvmonkey 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

dsvmonkey

Status

Install

What it does

CLI

Python API

Limitations

Using with AI assistants

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes