Filter and trim large CSV files by column values — keep only the rows and columns you need.
Project description
csvTrim
Filter and trim large CSV files by column values
csvTrim processes a single file or an entire folder of CSVs in one pass and keeps only the rows and columns you specify. It is optimised for large exports (e.g. Azure billing exports) but works with any structured CSV. Results can also be exported to Excel.
Features
- Row filtering — keep only rows whose filter column matches a list of values
- Column trimming — drop every column not in your keep list
- Folder processing — pass a folder path to process all
.csvfiles at once - Preset system — save named filter configurations to
presets.jsonand load them by name - Auto-default preset — run with just
--input/--outputto use the preset marked as default - Excel export — optional
.xlsxoutput; splits automatically across sheets if rows exceed Excel's worksheet limit - Memory-efficient — reads files in 100 000-row chunks so large exports don't run out of RAM
- Run summary — shows row counts, reduction percentage, per-value breakdown, and elapsed time
Quick start
# install via Homebrew Tap
brew tap KimTholstorf/csvtrim
brew install csvtrim
# Use the default preset, trim a single file
csvtrim --input data.csv --output trimmed.csv
# Process an entire folder, also produce Excel output
csvtrim --input ./exports --output trimmed.csv --excel
# Use a named preset
csvtrim --input data.csv --output trimmed.csv --preset Azure
The default presets.json is bundled with the package. To use a custom presets file, pass --preset-file /path/to/your_presets.json.
Installation
Homebrew
brew tap KimTholstorf/csvtrim
brew install csvtrim
Python package
pip install csvtrim
#or use pipx or uv for an isolated install not affecting your system Python:
pipx install csvtrim
uv tool install csvtrim
After installation via Homebrew Tap or pip, csvtrim is available as a shell command — no venv activation needed.
Docker
Build
docker build -t csvtrim .
Run
Pull the image from GitHub Container Registry, then mount a local folder to /data with -v to pass files in and retrieve output. All arguments work identically to the local script.
docker pull ghcr.io/kimtholstorf/csvtrim:latest
docker run --rm -it \
-v /your/data:/data \
ghcr.io/kimtholstorf/csvtrim:latest \
--input /data/export.csv --output /data/trimmed.csv
The -it flag gives csvTrim a real terminal so the progress bar and ANSI output render correctly. --rm removes the container automatically when it exits.
From source
Requirements:
- Python 3.9+
pandasopenpyxl(only needed for--excel)
# Clone the repo
git clone https://github.com/KimTholstorf/csvTrim.git
cd csvTrim
# One-time setup (creates .venv with pandas + openpyxl)
bash setup_python_env.sh
# Activate the environment
source .venv/bin/activate
The setup script installs uv if it isn't already present (via Homebrew if available, otherwise via curl).
CLI reference
| Argument | Short | Description |
|---|---|---|
--input PATH |
-i |
Single .csv file or folder of .csv files to process. Required unless --preset-save is used. |
--output FILE |
-o |
Output CSV file path (e.g. trimmed.csv). Required unless --preset-save is used. |
--excel |
-e |
Also write an .xlsx file alongside the output CSV. Splits into multiple sheets if the row count exceeds Excel's worksheet limit. |
--filter LIST |
-f |
Python list of values to keep, matched against --filter-column. Omit to use the default preset. Example: "['Compute', 'Storage']" |
--filter-column COL |
-fc |
Column name to match filter values against. Omit to use the default preset. |
--columns LIST |
-c |
Python list of column names to keep in the output. Omit to use the default preset. Example: "['meterCategory', 'quantity']" |
--preset NAME |
-p |
Load all filter settings from a named preset. Overrides --filter, --filter-column, and --columns. If no --preset and no individual flags are given, the _default preset is loaded automatically. |
--preset-file FILE |
-pf |
Path to a custom JSON presets file. Defaults to presets.json next to the script. |
--preset-save NAME |
-ps |
Save the current --filter, --filter-column, and --columns as a named preset (or overwrite an existing one). No CSV trimming is performed. |
--version |
-v |
Print the version and exit. |
Flag resolution order
When deciding which filter settings to use, csvTrim applies this priority:
--preset NAME— load everything from the named preset; individual flags are ignored.- No flags at all — auto-load the
_defaultpreset frompresets.json. - One or more individual flags — load the
_defaultpreset as a base, then apply any explicitly passed flags on top.
Preset system
Presets are stored in a JSON file (presets.json by default, next to the script). Each preset holds three values: the column to filter on, which values to keep, and which output columns to retain.
The "_default" key names which preset to load when no --preset or individual flags are given. To change the default, edit the string value — no other changes needed.
File format
{
"_default": "Azure",
"Azure": {
"filter_column": "serviceFamily",
"filter": ["Compute", "Networking", "Storage"],
"columns": [
"serviceFamily",
"meterCategory",
"meterSubCategory",
"meterName",
"ProductName",
"productOrderName",
"meterRegion",
"quantity",
"pricingModel",
"term",
"unitOfMeasure",
"ResourceId",
"date"
]
}
}
Using a preset
csvtrim --input data.csv --output out.csv --preset Azure
Saving a new preset
Use --preset-save together with the individual flags. No trimming is performed — the preset is written to presets.json and the script exits.
# Save a brand-new preset
csvtrim --preset-save GCP \
--filter-column "service.description" \
--filter "['Compute Engine', 'Cloud Storage', 'BigQuery']" \
--columns "['billing_account_id', 'service.description', 'cost', 'currency']"
# Copy an existing preset under a new name
csvtrim --preset Azure --preset-save AzureBackup
If the preset name already exists it is overwritten. The script prints a confirmation showing what was saved.
Using a custom presets file
csvtrim --input data.csv --output out.csv \
--preset MyPreset --preset-file /path/to/my_presets.json
--preset-file works with --preset, --preset-save, and the auto-default flow.
Examples
# Default run — auto-loads the '_default' preset
csvtrim --input data.csv --output trimmed.csv
# Named preset
csvtrim --input data.csv --output trimmed.csv --preset Azure
# Folder of CSVs + Excel output
csvtrim --input ./monthly_exports --output combined.csv --excel
# Override only the filter values; other settings come from the default preset
csvtrim --input data.csv --output out.csv \
--filter "['SaaS', 'Developer Tools', 'Containers', 'Databases']"
# Fully custom filter (no preset)
csvtrim --input data.csv --output out.csv \
--filter-column meterCategory \
--filter "['Virtual Machines', 'Storage']" \
--columns "['meterCategory', 'quantity', 'date']"
# Save a preset then use it
csvtrim --preset-save Prod \
--filter-column serviceFamily \
--filter "['Compute', 'Networking']" \
--columns "['serviceFamily', 'meterCategory', 'quantity', 'date']"
csvtrim --input data.csv --output out.csv --preset Prod
Docker examples
Same examples as above, run inside the container. Mount your data folder to /data and prefix paths accordingly. Use --preset-file /data/presets.json when saving or loading presets so changes persist to your local machine.
# Default run — auto-loads the '_default' preset
docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
--input /data/export.csv --output /data/trimmed.csv
# Named preset
docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
--input /data/export.csv --output /data/trimmed.csv --preset Azure
# Folder of CSVs + Excel output
docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
--input /data/monthly_exports --output /data/combined.csv --excel
# Override only the filter values; other settings come from the default preset
docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
--input /data/export.csv --output /data/out.csv \
--filter "['SaaS', 'Developer Tools', 'Containers', 'Databases']"
# Fully custom filter (no preset)
docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
--input /data/export.csv --output /data/out.csv \
--filter-column meterCategory \
--filter "['Virtual Machines', 'Storage']" \
--columns "['meterCategory', 'quantity', 'date']"
# Save a preset to the mounted folder, then use it
docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
--preset-save Prod \
--filter-column serviceFamily \
--filter "['Compute', 'Networking']" \
--columns "['serviceFamily', 'meterCategory', 'quantity', 'date']" \
--preset-file /data/presets.json
docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
--input /data/export.csv --output /data/out.csv \
--preset Prod --preset-file /data/presets.json
Output
After processing, csvTrim prints a summary:
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file csvtrim-1.0.3.tar.gz.
File metadata
- Download URL: csvtrim-1.0.3.tar.gz
- Upload date:
- Size: 12.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e1ef226976d1a28411047b9d843c0f89adfc6b1f6b2d752bd22bc14f62e05d36
|
|
| MD5 |
126811034241f27e38661bb5323f2f8b
|
|
| BLAKE2b-256 |
1b5d84c656a64a7fbc346a50c38c858e73722cbed96fe60ca1908cc07b5f51e9
|
Provenance
The following attestation bundles were made for csvtrim-1.0.3.tar.gz:
Publisher:
pypi-publish.yml on KimTholstorf/csvTrim
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
csvtrim-1.0.3.tar.gz -
Subject digest:
e1ef226976d1a28411047b9d843c0f89adfc6b1f6b2d752bd22bc14f62e05d36 - Sigstore transparency entry: 1090443343
- Sigstore integration time:
-
Permalink:
KimTholstorf/csvTrim@0a61e68434e41e48c14d0a9eafc675c571cd6652 -
Branch / Tag:
refs/tags/v1.0.3 - Owner: https://github.com/KimTholstorf
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@0a61e68434e41e48c14d0a9eafc675c571cd6652 -
Trigger Event:
push
-
Statement type:
File details
Details for the file csvtrim-1.0.3-py3-none-any.whl.
File metadata
- Download URL: csvtrim-1.0.3-py3-none-any.whl
- Upload date:
- Size: 11.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8583c5baa76c3f9d48717262e4dfc244d661dc01ccfa3edd8d0eaadc7bba5fce
|
|
| MD5 |
bee29d4476c4dfbdb066e2bcf36f1173
|
|
| BLAKE2b-256 |
854da2efb4c68408b795e9ee3ad66068121c85dd4d749080c959b61bf58e4c35
|
Provenance
The following attestation bundles were made for csvtrim-1.0.3-py3-none-any.whl:
Publisher:
pypi-publish.yml on KimTholstorf/csvTrim
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
csvtrim-1.0.3-py3-none-any.whl -
Subject digest:
8583c5baa76c3f9d48717262e4dfc244d661dc01ccfa3edd8d0eaadc7bba5fce - Sigstore transparency entry: 1090443347
- Sigstore integration time:
-
Permalink:
KimTholstorf/csvTrim@0a61e68434e41e48c14d0a9eafc675c571cd6652 -
Branch / Tag:
refs/tags/v1.0.3 - Owner: https://github.com/KimTholstorf
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@0a61e68434e41e48c14d0a9eafc675c571cd6652 -
Trigger Event:
push
-
Statement type: