High-performance CSV parser with SIMD optimizations
Project description
CISV Python Bindings (nanobind)
High-performance Python bindings for the CISV CSV parser using nanobind.
Performance
These bindings are 10-100x faster than the ctypes-based bindings because they:
- Use the batch API: All data is parsed in C and returned at once, eliminating millions of per-field callbacks
- Use nanobind: Much lower overhead than ctypes or pybind11
- Release the GIL: Parallel parsing runs without holding the Python GIL
| File Size | ctypes | nanobind | Speedup |
|---|---|---|---|
| 142MB (1M rows × 10 cols) | ~20s | <0.8s | 25x+ |
Installation
From PyPI (recommended)
pip install cisv
From source
cd cisv
pip install .
Development install
cd cisv
pip install -e .
Usage
import cisv
# Parse a file
rows = cisv.parse_file('data.csv')
# Parse with options
rows = cisv.parse_file(
'data.csv',
delimiter=';',
quote="'",
trim=True,
skip_empty_lines=True
)
# Parse large files in parallel (faster on multi-core systems)
rows = cisv.parse_file('large.csv', parallel=True)
# Parse a string
rows = cisv.parse_string("a,b,c\n1,2,3")
# Count rows quickly (SIMD-accelerated)
count = cisv.count_rows('data.csv')
# Row-by-row iteration (memory efficient, supports early exit)
with cisv.CisvIterator('large.csv') as reader:
for row in reader:
print(row) # List[str]
if row[0] == 'stop':
break # Early exit - no wasted work
# Or use the convenience function
for row in cisv.open_iterator('data.csv', delimiter=',', trim=True):
process(row)
API Reference
parse_file(path, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False, parallel=False, num_threads=0)
Parse a CSV file and return all rows.
Parameters:
path: Path to the CSV filedelimiter: Field delimiter character (default: ',')quote: Quote character (default: '"')trim: Whether to trim whitespace from fieldsskip_empty_lines: Whether to skip empty linesparallel: Use multi-threaded parsing (faster for large files)num_threads: Number of threads for parallel parsing (0 = auto-detect)
Returns: List of rows, where each row is a list of field values.
parse_string(content, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False)
Parse a CSV string and return all rows.
Parameters:
content: CSV content as a stringdelimiter: Field delimiter character (default: ',')quote: Quote character (default: '"')trim: Whether to trim whitespace from fieldsskip_empty_lines: Whether to skip empty lines
Returns: List of rows, where each row is a list of field values.
count_rows(path)
Count the number of rows in a CSV file without full parsing.
This is very fast as it only scans for newlines using SIMD instructions.
Parameters:
path: Path to the CSV file
Returns: Number of rows in the file.
CisvIterator(path, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False)
Row-by-row iterator for streaming CSV parsing with minimal memory footprint.
Provides fgetcsv-style iteration that supports early exit - breaking out of iteration stops parsing immediately with no wasted work.
Parameters:
path: Path to the CSV filedelimiter: Field delimiter character (default: ',')quote: Quote character (default: '"')trim: Whether to trim whitespace from fieldsskip_empty_lines: Whether to skip empty lines
Methods:
next(): Get the next row asList[str], orNoneif at end of fileclose(): Close the iterator and release resourcesclosed: Property indicating whether the iterator has been closed
Protocols:
- Iterator protocol: Use in
forloops withfor row in iterator - Context manager: Use with
withstatement for automatic cleanup
Example:
# Context manager (recommended)
with cisv.CisvIterator('data.csv') as reader:
for row in reader:
if row[0] == 'target':
print(f"Found: {row}")
break # Early exit
# Manual iteration
reader = cisv.CisvIterator('data.csv')
try:
while True:
row = reader.next()
if row is None:
break
process(row)
finally:
reader.close()
open_iterator(path, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False)
Convenience function that returns a CisvIterator. Same parameters as CisvIterator.
Example:
for row in cisv.open_iterator('data.csv'):
print(row)
Running Tests
cd cisv
pip install -e ".[test]"
pytest
Benchmarking
pip install -e ".[benchmark]"
python -c "
import cisv
import time
# Create test file
with open('/tmp/test.csv', 'w') as f:
f.write('col1,col2,col3\n')
for i in range(100000):
f.write(f'value{i}_1,value{i}_2,value{i}_3\n')
# Benchmark
start = time.time()
rows = cisv.parse_file('/tmp/test.csv')
print(f'Parsed {len(rows)} rows in {time.time()-start:.3f}s')
"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cisv-0.4.8.tar.gz.
File metadata
- Download URL: cisv-0.4.8.tar.gz
- Upload date:
- Size: 57.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc049e5ab5a001911825d090ca2ab3018cec4af274998644526fe987d97cec13
|
|
| MD5 |
3e843d0ab6fd1cf7aec1db7a1189733e
|
|
| BLAKE2b-256 |
1df5e3fb93ec18c3de83a187b188ffc28602cfa375601c99c08a1582e031056c
|
File details
Details for the file cisv-0.4.8-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: cisv-0.4.8-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 108.1 kB
- Tags: CPython 3.12, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5d1eebff14e7ecfd8be471e9ca983545965e3ddff5bf682f66ea72f0f98ba43
|
|
| MD5 |
702a246abbe7773512ed0bf9cfa13a3c
|
|
| BLAKE2b-256 |
5035a0a7437ec848c54b949724ec1d2f65970933180fadf160a55bba412a8aa0
|