Skip to main content

Read all csv files in a directory with one iterator.

Project description

📂 csvdir

A blazing-fast, lightweight toolkit for reading and iterating over entire directories of CSV files.

csvdir lets you treat a folder full of CSVs as if it were a single dataset — no tedious file loops, no clumsy header mismatches. Whether you’re working with a few files or thousands, csvdir is built for speed, simplicity, and flexibility.


✨ Features

  • 🔄 Directory-wide iteration – Read every CSV in a folder as a single stream of rows
  • 🧩 Header validation – Enforce matching headers or skip mismatched files
  • 📏 Chunked reading – Stream large datasets without blowing up memory
  • 🎯 Configurable dialect – Set delimiter, quotechar, encoding, and more
  • 📂 Recursive scanning – Optionally include subdirectories
  • 🐼 Pandas-ready – Use CsvDirFile directly with pandas.read_csv
  • 🚫 Hidden file handling – Easily skip or include hidden files
  • 🪶 Column selection – Iterate over just one column or a subset of columns
  • 📛 Flexible naming – Choose between file stems ("data") or full filenames ("data.csv") in enumerations

📦 Installation

pip install csvdir

🔹 Basic Usage

Iterate over all rows in a directory

from csvdir import read_dir

for row in read_dir("/data/csvs"):
    print(row)

Example output

{'id': '1', 'name': 'Alice', 'age': '30'}
{'id': '2', 'name': 'Bob', 'age': '25'}
{'id': '3', 'name': 'Charlie', 'age': '40'}

Enforce matching headers across files

for row in read_dir("/data/csvs", strict_headers=True, on_mismatch="skip"):
    print(row)

Example output

{'id': '1', 'name': 'Alice', 'age': '30'}
{'id': '2', 'name': 'Bob', 'age': '25'}

Chunked iteration for large files

for chunk in read_dir("/data/csvs", chunksize=2):
    print(chunk)

Example output

[{'id': '1', 'name': 'Alice'}, {'id': '2', 'name': 'Bob'}]
[{'id': '3', 'name': 'Charlie'}]

Enumerating rows with names or paths

r = read_dir("/data/csvs")

for name, row in r.with_names():
    print(name, row)

Example output

data1 {'id': '1', 'name': 'Alice'}
data1 {'id': '2', 'name': 'Bob'}
for path, row in r.with_paths():
    print(path, row)

Example output

/data/csvs/data1.csv {'id': '1', 'name': 'Alice'}
/data/csvs/data1.csv {'id': '2', 'name': 'Bob'}

Selecting a single column

r = read_dir("/data/csvs")

for value in r.iter_column("name"):
    print(value)

Example output

Alice
Bob
Charlie
for values in read_dir("/data/csvs", chunksize=2).iter_column_chunks("name"):
    print(values)

Example output

['Alice', 'Bob']
['Charlie']

Selecting multiple columns

r = read_dir("/data/csvs")

for row in r.select_columns(["name", "age"]):
    print(row)

Example output

{'name': 'Alice', 'age': '30'}
{'name': 'Bob', 'age': '25'}
for chunk in read_dir("/data/csvs", chunksize=2).select_columns_chunks(["name", "age"]):
    print(chunk)

Example output

[{'name': 'Alice', 'age': '30'}, {'name': 'Bob', 'age': '25'}]
[{'name': 'Charlie', 'age': '40'}]

🆕 Pandas Compatibility — CsvDirFile

import pandas as pd
from csvdir import CsvDirFile

f = CsvDirFile("/data/csvs", strict_headers=True, on_mismatch="skip")
df = pd.read_csv(f)
print(df.head())

Example output

   id   name  age
0   1  Alice   30
1   2    Bob   25
2   3 Charlie   40

📊 Iterator Quick Reference

Method Returns Chunked Version Naming Style
.with_names() (stem, row_dict) .enumerate()(stem, list[row_dict]) File stem ("data")
.with_paths() (full_path, row_dict) .with_paths_chunks()(full_path, list[row_dict]) Full path
.iter_column(col) (stem, value) .iter_column_chunks(col)(stem, list[value]) File stem
.select_columns(cols) (stem, dict) .select_columns_chunks(cols)(stem, list[dict]) File stem
Default (__iter__) row_dict Chunked default → list[row_dict] N/A

💡 Tips & Edge Cases

  • Hidden Files: By default, hidden files are ignored; set include_hidden=True to include them
  • Large Files: Use chunksize to prevent memory overload
  • Mixed Encodings: csvdir can detect BOMs and handle mixed encodings automatically
  • Header Order: strict_headers=True compares exact header order
  • Name vs Path: .with_names() and .enumerate() return the stem (file.stem), while .with_paths() returns the full path

📜 License

MIT License © 2025

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csvdir-0.8.0.tar.gz (20.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

csvdir-0.8.0-py3-none-any.whl (34.4 kB view details)

Uploaded Python 3

File details

Details for the file csvdir-0.8.0.tar.gz.

File metadata

  • Download URL: csvdir-0.8.0.tar.gz
  • Upload date:
  • Size: 20.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for csvdir-0.8.0.tar.gz
Algorithm Hash digest
SHA256 63836aa6cb588f5fa349b04083134c2c5aa35a6b6064490611025e3a34a9b719
MD5 7451dec885464151188cca0a4b4a18bc
BLAKE2b-256 b70cb6d46ba6dddd14e52b43b66bf3edb21c31c637b7b44d1fe6822ee695fd31

See more details on using hashes here.

File details

Details for the file csvdir-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: csvdir-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 34.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for csvdir-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 766b73723e4d6850a64ef3e2589b25df701c9b63a36eafc3f2e5f75b144f5f7c
MD5 94fb89e65697ad119cdf2de166899fd9
BLAKE2b-256 b18ac39f300a825d4b2a21558d76c194c9eac41cefeb958cce16afa10169ed7e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page