Filter and trim large CSV files by column values — keep only the rows and columns you need.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

KimTholstorf

These details have not been verified by PyPI

Project description

csvTrim

Filter and trim large CSV files by column values — keep only the rows and columns you need.

csvTrim processes a single file or an entire folder of CSVs in one pass. It is optimised for large billing exports (e.g. Azure cost data) but works with any structured CSV. Results can also be exported to Excel.

Features

Row filtering — keep only rows whose filter column matches a list of values
Column trimming — drop every column not in your keep list
Folder processing — pass a folder path to process all .csv files at once
Preset system — save named filter configurations to presets.json and load them by name
Auto-default preset — run with just --input / --output to use the preset marked as default
Excel export — optional .xlsx output; splits automatically across sheets if rows exceed Excel's worksheet limit
Memory-efficient — reads files in 100 000-row chunks so large exports don't run out of RAM
Run summary — shows row counts, reduction percentage, per-value breakdown, and elapsed time

Requirements

Python 3.9+
pandas
openpyxl (only needed for --excel)

Install

# One-time setup (creates .venv with pandas + openpyxl)
bash setup_python_env.sh

# Activate the environment
source .venv/bin/activate

The setup script installs uv if it isn't already present (via Homebrew if available, otherwise via curl).

Install via pip

pip install csvtrim

# or, for an isolated install that won't affect your system Python:
pipx install csvtrim

After installation, csvtrim is available as a shell command — no venv activation needed:

csvtrim --input data.csv --output trimmed.csv

The default presets.json is bundled with the package. To use a custom presets file, pass --preset-file /path/to/your_presets.json.

Docker

Build

docker build -t csvtrim .

Run

Pull the image from GitHub Container Registry, then mount a local folder to /data with -v to pass files in and retrieve output. All arguments work identically to the local script.

docker pull ghcr.io/kimtholstorf/csvtrim:latest

docker run --rm -it \
  -v /your/data:/data \
  ghcr.io/kimtholstorf/csvtrim:latest \
  --input /data/export.csv --output /data/trimmed.csv

The -it flag gives csvTrim a real terminal so the progress bar and ANSI output render correctly. --rm removes the container automatically when it exits.

Quick start

# Use the default preset, trim a single file
python3 csvTrim.py --input data.csv --output trimmed.csv

# Process an entire folder, also produce Excel output
python3 csvTrim.py --input ./exports --output trimmed.csv --excel

# Use a named preset
python3 csvTrim.py --input data.csv --output trimmed.csv --preset Azure

CLI reference

Argument	Short	Description
`--input PATH`	`-i`	Single `.csv` file or folder of `.csv` files to process. Required unless `--preset-save` is used.
`--output FILE`	`-o`	Output CSV file path (e.g. `trimmed.csv`). Required unless `--preset-save` is used.
`--excel`	`-e`	Also write an `.xlsx` file alongside the output CSV. Splits into multiple sheets if the row count exceeds Excel's worksheet limit.
`--filter LIST`	`-f`	Python list of values to keep, matched against `--filter-column`. Omit to use the default preset. Example: `"['Compute', 'Storage']"`
`--filter-column COL`	`-fc`	Column name to match filter values against. Omit to use the default preset.
`--columns LIST`	`-c`	Python list of column names to keep in the output. Omit to use the default preset. Example: `"['meterCategory', 'quantity']"`
`--preset NAME`	`-p`	Load all filter settings from a named preset. Overrides `--filter`, `--filter-column`, and `--columns`. If no `--preset` and no individual flags are given, the `_default` preset is loaded automatically.
`--preset-file FILE`	`-pf`	Path to a custom JSON presets file. Defaults to `presets.json` next to the script.
`--preset-save NAME`	`-ps`	Save the current `--filter`, `--filter-column`, and `--columns` as a named preset (or overwrite an existing one). No CSV trimming is performed.
`--version`	`-v`	Print the version and exit.

Flag resolution order

When deciding which filter settings to use, csvTrim applies this priority:

--preset NAME — load everything from the named preset; individual flags are ignored.
No flags at all — auto-load the _default preset from presets.json.
One or more individual flags — load the _default preset as a base, then apply any explicitly passed flags on top.

Preset system

Presets are stored in a JSON file (presets.json by default, next to the script). Each preset holds three values: the column to filter on, which values to keep, and which output columns to retain. The "_default" key names which preset to load when no --preset or individual flags are given. To change the default, edit the string value — no other changes needed.

File format

{
  "_default": "Azure",
  "Azure": {
    "filter_column": "serviceFamily",
    "filter": ["Compute", "Networking", "Storage"],
    "columns": [
      "serviceFamily",
      "meterCategory",
      "meterSubCategory",
      "meterName",
      "ProductName",
      "productOrderName",
      "meterRegion",
      "quantity",
      "pricingModel",
      "term",
      "unitOfMeasure",
      "ResourceId",
      "date"
    ]
  }
}

Using a preset

python3 csvTrim.py --input data.csv --output out.csv --preset Azure

Saving a new preset

Use --preset-save together with the individual flags. No trimming is performed — the preset is written to presets.json and the script exits.

# Save a brand-new preset
python3 csvTrim.py --preset-save GCP \
  --filter-column "service.description" \
  --filter "['Compute Engine', 'Cloud Storage', 'BigQuery']" \
  --columns "['billing_account_id', 'service.description', 'cost', 'currency']"

# Copy an existing preset under a new name
python3 csvTrim.py --preset Azure --preset-save AzureBackup

If the preset name already exists it is overwritten. The script prints a confirmation showing what was saved.

Using a custom presets file

python3 csvTrim.py --input data.csv --output out.csv \
  --preset MyPreset --preset-file /path/to/my_presets.json

--preset-file works with --preset, --preset-save, and the auto-default flow.

Examples

# Default run — auto-loads the '_default' preset
python3 csvTrim.py --input data.csv --output trimmed.csv

# Named preset
python3 csvTrim.py --input data.csv --output trimmed.csv --preset Azure

# Folder of CSVs + Excel output
python3 csvTrim.py --input ./monthly_exports --output combined.csv --excel

# Override only the filter values; other settings come from the default preset
python3 csvTrim.py --input data.csv --output out.csv \
  --filter "['SaaS', 'Developer Tools', 'Containers', 'Databases']"

# Fully custom filter (no preset)
python3 csvTrim.py --input data.csv --output out.csv \
  --filter-column meterCategory \
  --filter "['Virtual Machines', 'Storage']" \
  --columns "['meterCategory', 'quantity', 'date']"

# Save a preset then use it
python3 csvTrim.py --preset-save Prod \
  --filter-column serviceFamily \
  --filter "['Compute', 'Networking']" \
  --columns "['serviceFamily', 'meterCategory', 'quantity', 'date']"

python3 csvTrim.py --input data.csv --output out.csv --preset Prod

Docker examples

Same examples as above, run inside the container. Mount your data folder to /data and prefix paths accordingly. Use --preset-file /data/presets.json when saving or loading presets so changes persist to your local machine.

# Default run — auto-loads the '_default' preset
docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
  --input /data/export.csv --output /data/trimmed.csv

# Named preset
docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
  --input /data/export.csv --output /data/trimmed.csv --preset Azure

# Folder of CSVs + Excel output
docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
  --input /data/monthly_exports --output /data/combined.csv --excel

# Override only the filter values; other settings come from the default preset
docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
  --input /data/export.csv --output /data/out.csv \
  --filter "['SaaS', 'Developer Tools', 'Containers', 'Databases']"

# Fully custom filter (no preset)
docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
  --input /data/export.csv --output /data/out.csv \
  --filter-column meterCategory \
  --filter "['Virtual Machines', 'Storage']" \
  --columns "['meterCategory', 'quantity', 'date']"

# Save a preset to the mounted folder, then use it
docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
  --preset-save Prod \
  --filter-column serviceFamily \
  --filter "['Compute', 'Networking']" \
  --columns "['serviceFamily', 'meterCategory', 'quantity', 'date']" \
  --preset-file /data/presets.json

docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
  --input /data/export.csv --output /data/out.csv \
  --preset Prod --preset-file /data/presets.json

Output

After processing, csvTrim prints a summary:

  ══════════════════════════════════════════════════════════
  Files:    3       Rows in:     2,841,504   Elapsed: 8.3s
  ──────────────────────────────────────────────────────────
  Columns kept:        13
  Columns removed:     51     (79.7%)
  Rows out:           312,847
  Rows removed:     2,528,657  (89.0% reduction)
  ──────────────────────────────────────────────────────────
  Rows by serviceFamily:
    Compute      241,003
    Networking    48,201
    Storage       23,643
  ══════════════════════════════════════════════════════════

Skipped files (missing columns, encoding errors, etc.) are listed below the summary with the reason.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

KimTholstorf

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.0.6

Mar 23, 2026

1.0.5

Mar 23, 2026

1.0.4

Mar 12, 2026

1.0.3

Mar 12, 2026

1.0.2

Mar 12, 2026

This version

1.0.1

Mar 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csvtrim-1.0.1.tar.gz (12.8 kB view details)

Uploaded Mar 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

csvtrim-1.0.1-py3-none-any.whl (10.9 kB view details)

Uploaded Mar 11, 2026 Python 3

File details

Details for the file csvtrim-1.0.1.tar.gz.

File metadata

Download URL: csvtrim-1.0.1.tar.gz
Upload date: Mar 11, 2026
Size: 12.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for csvtrim-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`27f2c8468df61549ad44968b339ded7d2aa6c2e5b15adcd34ca43c7213afc0c4`
MD5	`bd31a71c4322a915dd83f4f17b0f87e6`
BLAKE2b-256	`99130bc3095f6cee40676e959ff70803156bc9b2ca7278d4095f5f42d9119229`

See more details on using hashes here.

Provenance

The following attestation bundles were made for csvtrim-1.0.1.tar.gz:

Publisher: pypi-publish.yml on KimTholstorf/csvTrim

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: csvtrim-1.0.1.tar.gz
- Subject digest: 27f2c8468df61549ad44968b339ded7d2aa6c2e5b15adcd34ca43c7213afc0c4
- Sigstore transparency entry: 1081254638
- Sigstore integration time: Mar 11, 2026
Source repository:
- Permalink: KimTholstorf/csvTrim@74de077c0683e76ba912ecd3e617f91b94d4c948
- Branch / Tag: refs/tags/v1.0.1
- Owner: https://github.com/KimTholstorf
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@74de077c0683e76ba912ecd3e617f91b94d4c948
- Trigger Event: push

File details

Details for the file csvtrim-1.0.1-py3-none-any.whl.

File metadata

Download URL: csvtrim-1.0.1-py3-none-any.whl
Upload date: Mar 11, 2026
Size: 10.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for csvtrim-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`197f0994cb20d4fdbaa3100b976647181b7290dd06b5e1e543544570f837fd35`
MD5	`893378a65aa3df332c0398e1e9762c21`
BLAKE2b-256	`fc76dc4f62d8f629875e1b04b332d8e0d9baacf3771d13ac17df02a31449e177`

See more details on using hashes here.

Provenance

The following attestation bundles were made for csvtrim-1.0.1-py3-none-any.whl:

Publisher: pypi-publish.yml on KimTholstorf/csvTrim

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: csvtrim-1.0.1-py3-none-any.whl
- Subject digest: 197f0994cb20d4fdbaa3100b976647181b7290dd06b5e1e543544570f837fd35
- Sigstore transparency entry: 1081254715
- Sigstore integration time: Mar 11, 2026
Source repository:
- Permalink: KimTholstorf/csvTrim@74de077c0683e76ba912ecd3e617f91b94d4c948
- Branch / Tag: refs/tags/v1.0.1
- Owner: https://github.com/KimTholstorf
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@74de077c0683e76ba912ecd3e617f91b94d4c948
- Trigger Event: push

csvtrim 1.0.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

csvTrim

Features

Requirements

Install

Install via pip

Docker

Build

Run

Quick start

CLI reference

Flag resolution order

Preset system

File format

Using a preset

Saving a new preset

Using a custom presets file

Examples

Docker examples

Output

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance