Read all csv files in a directory with one iterator.
Project description
📂 csvdir
A blazing-fast, lightweight toolkit for reading and iterating over entire directories of CSV files.
csvdir lets you treat a folder full of CSVs as if it were a single dataset — no tedious file loops, no clumsy header mismatches. Whether you’re working with a few files or thousands, csvdir is built for speed, simplicity, and flexibility.
✨ Features
- 🔄 Directory-wide iteration – Read every CSV in a folder as a single stream of rows
- 🧩 Header validation – Enforce matching headers or skip mismatched files
- 📏 Chunked reading – Stream large datasets without blowing up memory
- 🎯 Configurable dialect – Set
delimiter,quotechar,encoding, and more - 📂 Recursive scanning – Optionally include subdirectories
- 🐼 Pandas-ready – Use
CsvDirFiledirectly withpandas.read_csv - 🚫 Hidden file handling – Easily skip or include hidden files
- 🪶 Column selection – Iterate over just one column or a subset of columns
- 📛 Flexible naming – Choose between file stems (
"data") or full filenames ("data.csv") in enumerations
📦 Installation
pip install csvdir
🔹 Basic Usage
Iterate over all rows in a directory
from csvdir import read_dir
for row in read_dir("/data/csvs"):
print(row)
Example output
{'id': '1', 'name': 'Alice', 'age': '30'}
{'id': '2', 'name': 'Bob', 'age': '25'}
{'id': '3', 'name': 'Charlie', 'age': '40'}
Enforce matching headers across files
for row in read_dir("/data/csvs", strict_headers=True, on_mismatch="skip"):
print(row)
Example output
{'id': '1', 'name': 'Alice', 'age': '30'}
{'id': '2', 'name': 'Bob', 'age': '25'}
Chunked iteration for large files
for chunk in read_dir("/data/csvs", chunksize=2):
print(chunk)
Example output
[{'id': '1', 'name': 'Alice'}, {'id': '2', 'name': 'Bob'}]
[{'id': '3', 'name': 'Charlie'}]
Enumerating rows with names or paths
r = read_dir("/data/csvs")
for name, row in r.with_names():
print(name, row)
Example output
data1 {'id': '1', 'name': 'Alice'}
data1 {'id': '2', 'name': 'Bob'}
for path, row in r.with_paths():
print(path, row)
Example output
/data/csvs/data1.csv {'id': '1', 'name': 'Alice'}
/data/csvs/data1.csv {'id': '2', 'name': 'Bob'}
Selecting a single column
r = read_dir("/data/csvs")
for value in r.iter_column("name"):
print(value)
Example output
Alice
Bob
Charlie
for values in read_dir("/data/csvs", chunksize=2).iter_column_chunks("name"):
print(values)
Example output
['Alice', 'Bob']
['Charlie']
Selecting multiple columns
r = read_dir("/data/csvs")
for row in r.select_columns(["name", "age"]):
print(row)
Example output
{'name': 'Alice', 'age': '30'}
{'name': 'Bob', 'age': '25'}
for chunk in read_dir("/data/csvs", chunksize=2).select_columns_chunks(["name", "age"]):
print(chunk)
Example output
[{'name': 'Alice', 'age': '30'}, {'name': 'Bob', 'age': '25'}]
[{'name': 'Charlie', 'age': '40'}]
🆕 Pandas Compatibility — CsvDirFile
import pandas as pd
from csvdir import CsvDirFile
f = CsvDirFile("/data/csvs", strict_headers=True, on_mismatch="skip")
df = pd.read_csv(f)
print(df.head())
Example output
id name age
0 1 Alice 30
1 2 Bob 25
2 3 Charlie 40
📊 Iterator Quick Reference
| Method | Returns | Chunked Version | Naming Style |
|---|---|---|---|
.with_names() |
(stem, row_dict) |
.enumerate() → (stem, list[row_dict]) |
File stem ("data") |
.with_paths() |
(full_path, row_dict) |
.with_paths_chunks() → (full_path, list[row_dict]) |
Full path |
.iter_column(col) |
(stem, value) |
.iter_column_chunks(col) → (stem, list[value]) |
File stem |
.select_columns(cols) |
(stem, dict) |
.select_columns_chunks(cols) → (stem, list[dict]) |
File stem |
Default (__iter__) |
row_dict |
Chunked default → list[row_dict] |
N/A |
💡 Tips & Edge Cases
- Hidden Files: By default, hidden files are ignored; set
include_hidden=Trueto include them - Large Files: Use
chunksizeto prevent memory overload - Mixed Encodings:
csvdircan detect BOMs and handle mixed encodings automatically - Header Order:
strict_headers=Truecompares exact header order - Name vs Path:
.with_names()and.enumerate()return the stem (file.stem), while.with_paths()returns the full path
📜 License
MIT License © 2025
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file csvdir-0.8.0.tar.gz.
File metadata
- Download URL: csvdir-0.8.0.tar.gz
- Upload date:
- Size: 20.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
63836aa6cb588f5fa349b04083134c2c5aa35a6b6064490611025e3a34a9b719
|
|
| MD5 |
7451dec885464151188cca0a4b4a18bc
|
|
| BLAKE2b-256 |
b70cb6d46ba6dddd14e52b43b66bf3edb21c31c637b7b44d1fe6822ee695fd31
|
File details
Details for the file csvdir-0.8.0-py3-none-any.whl.
File metadata
- Download URL: csvdir-0.8.0-py3-none-any.whl
- Upload date:
- Size: 34.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
766b73723e4d6850a64ef3e2589b25df701c9b63a36eafc3f2e5f75b144f5f7c
|
|
| MD5 |
94fb89e65697ad119cdf2de166899fd9
|
|
| BLAKE2b-256 |
b18ac39f300a825d4b2a21558d76c194c9eac41cefeb958cce16afa10169ed7e
|