Read all csv files in a directory with one iterator.
Project description
📂 csvdir
A blazing-fast, lightweight toolkit for reading and iterating over entire directories of CSV files.
csvdir lets you treat a folder full of CSVs as if it were a single dataset — no tedious file loops, no clumsy header mismatches. Whether you’re working with a few files or thousands, csvdir is built for speed, simplicity, and flexibility.
✨ Features
- 🔄 Directory-wide iteration – Read every CSV in a folder as a single stream of rows
- 🧩 Header validation – Enforce matching headers or skip mismatched files
- 📏 Chunked reading – Stream large datasets without blowing up memory
- 🎯 Configurable dialect – Set
delimiter,quotechar,encoding, and more - 📂 Recursive scanning – Optionally include subdirectories
- 🐼 Pandas-ready – Use
CsvDirFiledirectly withpandas.read_csv - 🚫 Hidden file handling – Easily skip or include hidden files
📦 Installation
pip install csvdir
🔹 Basic Usage
Iterate over all rows in a directory
from csvdir import read_dir
for row in read_dir("/data/csvs"):
print(row) # Each row is a dict mapping column names to string values
Enforce matching headers across files
for row in read_dir("/data/csvs", strict_headers=True, on_mismatch="skip"):
print(row)
strict_headers=True→ Uses the first file’s header as the standardon_mismatch:"skip"→ skip files with different headers"error"→ raise aValueErrorif a mismatch is found
Chunked iteration for large files
for chunk in read_dir("/data/csvs", chunksize=1000):
# chunk is a list of up to 1000 rows
process(chunk)
🆕 Pandas Compatibility — CsvDirFile
CsvDirFile behaves like a file object that merges multiple CSVs into one continuous file-like stream — perfect for pandas.read_csv.
import pandas as pd
from csvdir import CsvDirFile
f = CsvDirFile("/data/csvs", strict_headers=True, on_mismatch="skip")
df = pd.read_csv(f)
print(df.head())
Advantages:
- Pandas reads multiple CSVs as if they were one file
- Automatically skips duplicate headers between files
- Honors header validation rules
📂 API Overview
read_dir(path, **options)
Iterates through rows (or chunks) of CSV files in a directory.
Parameters:
extension: File extension (default"csv")delimiter,quotechar,escapechar: CSV parsing optionsencoding: File encoding (default"utf-8")strict_headers: Enforce header consistency (defaultFalse)on_mismatch:"skip"or"error"chunksize: If set, returns lists of rows instead of single rowsrecurse: Include subdirectories (defaultFalse)case_insensitive: Match extensions case-insensitively (defaultTrue)include_hidden: Include dotfiles (defaultFalse)
💡 Tips & Edge Cases
- Hidden Files: By default, hidden files are ignored; set
include_hidden=Trueto include them - Large Files: Use
chunksizeto prevent memory overload - Mixed Encodings:
csvdircan detect BOMs and handle mixed encodings automatically - Header Order:
strict_headers=Truecompares exact header order
📜 License
MIT License © 2025
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file csvdir-0.7.0.tar.gz.
File metadata
- Download URL: csvdir-0.7.0.tar.gz
- Upload date:
- Size: 16.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af921b957e38a9a3f28d25008d06cdb38d604e68e0bee7ddd3fc6f81d0651db9
|
|
| MD5 |
1b21c8b74bb676ef144112b07af0c183
|
|
| BLAKE2b-256 |
7d6e8338cffc5db8da9130c314b3dff327f77108a3179568d11687b5179aa252
|
File details
Details for the file csvdir-0.7.0-py3-none-any.whl.
File metadata
- Download URL: csvdir-0.7.0-py3-none-any.whl
- Upload date:
- Size: 25.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
866b82a7ffc328a043239ce9a2622bddc588f519a4029dc42c34b4d75ba8b983
|
|
| MD5 |
8a9f6be4b99c5ed72445ec52476328cb
|
|
| BLAKE2b-256 |
c0367a206554f03904262333e2bf4fb1bc203bcda0b6d6f601f07a4d84d6c404
|