Small CSV utilities: row deduplication, classification, row filtering, and CLI helpers.
Project description
Small, focused CSV utilities for common data wrangling tasks.
csvsmith provides a handful of practical tools for working with CSV files, including cleaning numeric values, filtering rows, deduplicating records, classifying files, converting Excel spreadsheets to CSV, moving files by suffix, and finding matches inside CSV content.
Documentation
Read the full documentation at:
Features
Clean numeric strings into normalized values
Filter CSV rows by substring matching
Deduplicate row data and generate reports
Classify CSV files into folders based on headers/signatures
Convert Excel workbooks to CSV
Move files by suffix
Find matching values inside CSV files
Concatenate CSV files with identical headers
Use the tools either from Python or from the command line
Installation
Install the package in your environment as usual for your project setup.
Example:
pip install csvsmith
If you are developing locally, install it in editable mode from the project root:
pip install -e .
Quick start
You can use the library from Python:
from csvsmith.utils.clean_numeric import clean_currency_numeric
print(clean_currency_numeric("$1,234.56"))
For command-line usage, use single quotes around values containing $:
csvsmith --help
Command-line usage
The package provides a CLI with several subcommands.
Clean numeric values:
csvsmith clean-numeric "1,234.56" --sep "," --decimal "."
Clean currency-prefixed numeric values:
csvsmith clean-currency-numeric '$1,234.56' --sep "," --decimal "."
Filter rows in a CSV:
csvsmith drop-rows input.csv notes spam --case-insensitive --drop-header
Deduplicate rows:
csvsmith dedupe input.csv -o out.csv --subset id --keep first
Classify CSV files:
csvsmith classify src_dir dst_dir --mode relaxed --match subset --auto --dry-run
Convert Excel to CSV:
csvsmith excel2csv input.xlsx
Move files by suffix:
csvsmith move-files src_dir dst_dir --suffixes .csv,.pdf
Find matches in a CSV:
csvsmith find-matches input.csv target --ignore-case --ignore-whitespace
Concatenate CSV files:
csvsmith strict-concat file1.csv file2.csv -o combined.csv
Find matches in a CSV
find_matches_in_csv searches a CSV file for a target value and returns match records containing coordinates and row context information.
Python API:
from csvsmith import find_matches_in_csv
results = find_matches_in_csv("input.csv", "target")
CLI:
csvsmith find-matches input.csv target
Options:
--ignore-case: ignore case while matching
--ignore-whitespace: ignore whitespace while matching
--no-nfkc: disable NFKC normalization
If matches are found, the CLI prints formatted JSON. If no matches are found, it prints a simple message.
Other Python APIs
The package also exposes a few other helper functions and classes from its top-level API.
Numeric and row tools:
from csvsmith import (
clean_numeric,
count_duplicates_sorted,
add_row_digest,
find_duplicate_rows,
dedupe_with_report,
read_csv_rows,
write_csv_rows,
)
CSV classification and filtering:
from csvsmith import CSVClassifier, DropRowsBySubstring, CSVCleaner
File and conversion helpers:
from csvsmith import excel_to_csv, move_by_suffix, strict_concat_rows, save_csv
String comparison utilities:
from csvsmith import StringDistance, Relation, Result, analyze_pair
Project structure
The code is organized into two main areas:
csvsmith.tools for higher-level CSV workflows
csvsmith.utils for reusable utility helpers
Testing
Run the test suite with your preferred Python test runner.
Example:
pytest
License
See the project license for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file csvsmith-0.8.0.tar.gz.
File metadata
- Download URL: csvsmith-0.8.0.tar.gz
- Upload date:
- Size: 29.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f9155f09cada6adefc8475405c123d46d49f3779a6021f892fd152000f0788a
|
|
| MD5 |
947fc615dc8b28501bad38b712654e7b
|
|
| BLAKE2b-256 |
f54a26179d1bc8f748a7b55e0a433561a5dfb409eea2b1a7c659700562e2fd1d
|
File details
Details for the file csvsmith-0.8.0-py3-none-any.whl.
File metadata
- Download URL: csvsmith-0.8.0-py3-none-any.whl
- Upload date:
- Size: 23.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c4ff2ce2302b00c541d20a7e324b55434ef3158f7eeb549a149ca4be16c3816
|
|
| MD5 |
f016210eb0206e29e83911a4015e7cb3
|
|
| BLAKE2b-256 |
3158fc703868e2dbe682bdc71c8a4129f5a976756566a51a21bd19b869a81a25
|