Skip to main content

High-performance CSV parser with SIMD optimizations

Project description

CISV Python Bindings (nanobind)

High-performance Python bindings for the CISV CSV parser using nanobind.

Performance

These bindings are 10-100x faster than the ctypes-based bindings because they:

  1. Use the batch API: All data is parsed in C and returned at once, eliminating millions of per-field callbacks
  2. Use nanobind: Much lower overhead than ctypes or pybind11
  3. Release the GIL: Parallel parsing runs without holding the Python GIL
File Size ctypes nanobind Speedup
142MB (1M rows × 10 cols) ~20s <0.8s 25x+

Installation

From PyPI (recommended)

pip install cisv

From source

cd bindings/python-nanobind
pip install .

Development install

cd bindings/python-nanobind
pip install -e .

Usage

import cisv

# Parse a file
rows = cisv.parse_file('data.csv')

# Parse with options
rows = cisv.parse_file(
    'data.csv',
    delimiter=';',
    quote="'",
    trim=True,
    skip_empty_lines=True
)

# Parse large files in parallel (faster on multi-core systems)
rows = cisv.parse_file('large.csv', parallel=True)

# Parse a string
rows = cisv.parse_string("a,b,c\n1,2,3")

# Count rows quickly (SIMD-accelerated)
count = cisv.count_rows('data.csv')

# Row-by-row iteration (memory efficient, supports early exit)
with cisv.CisvIterator('large.csv') as reader:
    for row in reader:
        print(row)  # List[str]
        if row[0] == 'stop':
            break  # Early exit - no wasted work

# Or use the convenience function
for row in cisv.open_iterator('data.csv', delimiter=',', trim=True):
    process(row)

API Reference

parse_file(path, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False, parallel=False, num_threads=0)

Parse a CSV file and return all rows.

Parameters:

  • path: Path to the CSV file
  • delimiter: Field delimiter character (default: ',')
  • quote: Quote character (default: '"')
  • trim: Whether to trim whitespace from fields
  • skip_empty_lines: Whether to skip empty lines
  • parallel: Use multi-threaded parsing (faster for large files)
  • num_threads: Number of threads for parallel parsing (0 = auto-detect)

Returns: List of rows, where each row is a list of field values.

parse_string(content, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False)

Parse a CSV string and return all rows.

Parameters:

  • content: CSV content as a string
  • delimiter: Field delimiter character (default: ',')
  • quote: Quote character (default: '"')
  • trim: Whether to trim whitespace from fields
  • skip_empty_lines: Whether to skip empty lines

Returns: List of rows, where each row is a list of field values.

count_rows(path)

Count the number of rows in a CSV file without full parsing.

This is very fast as it only scans for newlines using SIMD instructions.

Parameters:

  • path: Path to the CSV file

Returns: Number of rows in the file.

CisvIterator(path, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False)

Row-by-row iterator for streaming CSV parsing with minimal memory footprint.

Provides fgetcsv-style iteration that supports early exit - breaking out of iteration stops parsing immediately with no wasted work.

Parameters:

  • path: Path to the CSV file
  • delimiter: Field delimiter character (default: ',')
  • quote: Quote character (default: '"')
  • trim: Whether to trim whitespace from fields
  • skip_empty_lines: Whether to skip empty lines

Methods:

  • next(): Get the next row as List[str], or None if at end of file
  • close(): Close the iterator and release resources
  • closed: Property indicating whether the iterator has been closed

Protocols:

  • Iterator protocol: Use in for loops with for row in iterator
  • Context manager: Use with with statement for automatic cleanup

Example:

# Context manager (recommended)
with cisv.CisvIterator('data.csv') as reader:
    for row in reader:
        if row[0] == 'target':
            print(f"Found: {row}")
            break  # Early exit

# Manual iteration
reader = cisv.CisvIterator('data.csv')
try:
    while True:
        row = reader.next()
        if row is None:
            break
        process(row)
finally:
    reader.close()

open_iterator(path, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False)

Convenience function that returns a CisvIterator. Same parameters as CisvIterator.

Example:

for row in cisv.open_iterator('data.csv'):
    print(row)

Running Tests

cd bindings/python-nanobind
pip install -e ".[test]"
pytest

Benchmarking

pip install -e ".[benchmark]"
python -c "
import cisv
import time

# Create test file
with open('/tmp/test.csv', 'w') as f:
    f.write('col1,col2,col3\n')
    for i in range(100000):
        f.write(f'value{i}_1,value{i}_2,value{i}_3\n')

# Benchmark
start = time.time()
rows = cisv.parse_file('/tmp/test.csv')
print(f'Parsed {len(rows)} rows in {time.time()-start:.3f}s')
"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cisv-0.4.3.tar.gz (48.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cisv-0.4.3-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (108.0 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.4.3-cp313-cp313-macosx_11_0_x86_64.whl (74.4 kB view details)

Uploaded CPython 3.13macOS 11.0+ x86-64

cisv-0.4.3-cp313-cp313-macosx_11_0_arm64.whl (70.7 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cisv-0.4.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (108.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.4.3-cp312-cp312-macosx_11_0_x86_64.whl (74.4 kB view details)

Uploaded CPython 3.12macOS 11.0+ x86-64

cisv-0.4.3-cp312-cp312-macosx_11_0_arm64.whl (70.7 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cisv-0.4.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (108.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.4.3-cp311-cp311-macosx_11_0_x86_64.whl (74.8 kB view details)

Uploaded CPython 3.11macOS 11.0+ x86-64

cisv-0.4.3-cp311-cp311-macosx_11_0_arm64.whl (71.3 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cisv-0.4.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (108.7 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.4.3-cp310-cp310-macosx_11_0_x86_64.whl (75.0 kB view details)

Uploaded CPython 3.10macOS 11.0+ x86-64

cisv-0.4.3-cp310-cp310-macosx_11_0_arm64.whl (71.5 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

cisv-0.4.3-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (109.0 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.4.3-cp39-cp39-macosx_11_0_x86_64.whl (75.2 kB view details)

Uploaded CPython 3.9macOS 11.0+ x86-64

cisv-0.4.3-cp39-cp39-macosx_11_0_arm64.whl (71.7 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file cisv-0.4.3.tar.gz.

File metadata

  • Download URL: cisv-0.4.3.tar.gz
  • Upload date:
  • Size: 48.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cisv-0.4.3.tar.gz
Algorithm Hash digest
SHA256 fe001044f3299e477a88aab11b6f3d7bd18e41a1a821d0b6dafd78a46212ff3d
MD5 5d8654f61ba0ca327235ed5bf852be29
BLAKE2b-256 2bf29f40e6f7c94305f80a68f51c244e4951c82887d3a43aaa034084066a771b

See more details on using hashes here.

File details

Details for the file cisv-0.4.3-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.3-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7ecddb61c439de88d3016b3b4da1ba3dcbf588270281f3e35e0026de8b92b1f3
MD5 5b1ac81ec9d69084eb44625d8f27618c
BLAKE2b-256 19c54634e1b81aa9a0255b4b790765270c365cf60cf597664861169f8a26a334

See more details on using hashes here.

File details

Details for the file cisv-0.4.3-cp313-cp313-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.3-cp313-cp313-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 e9044ff82ce4866b46dea1889f54ab372fe0ab6e83a42851c6d900d4dd2d3e0b
MD5 df2755872136a62d61f956dfb4ac9d64
BLAKE2b-256 1aa8d57f479ad1b804b8fbb856c348744012c0c40c3a746dc36c7f719ba54157

See more details on using hashes here.

File details

Details for the file cisv-0.4.3-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.4.3-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fdc72340a3db7ac307ee56bad5ce7e8d81cae6fa7704e9fd5fb6fae361d1edd4
MD5 5c6c27ff9b03aba15f986dbe022f3b55
BLAKE2b-256 4367fe3afef9d3c629f696b96f34ab898cfc2fbb201fd15c00ab87347d0ad7a4

See more details on using hashes here.

File details

Details for the file cisv-0.4.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 060a17dc518b5f9fe9960aff93c841bf5e9fe0c9ba8c8bd19f74e8d8e00a2b8d
MD5 079f1619f1a8e807c5bb69e185d0052f
BLAKE2b-256 978379997639b58f2acca232ffad1613c9643540356eb22c51433d8db777483b

See more details on using hashes here.

File details

Details for the file cisv-0.4.3-cp312-cp312-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.3-cp312-cp312-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 dd0e39f88c92263e8497ca434e2ea233777dbf4487cb71caee020124bf910a3d
MD5 c66f5959a40221f0d845302820e0fd08
BLAKE2b-256 26c8ed4c0399a15d720b22965431ee378e49140423bb22f4d79ec12a4d0415f1

See more details on using hashes here.

File details

Details for the file cisv-0.4.3-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.4.3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 55020ba002870a32f9d80078bdc541a139aaf7a2c1d6cea12c341ce504940c48
MD5 1ffffbd485486573243eff5859fa20e8
BLAKE2b-256 df4d8a0e212487253555920fbd35d68a645efb1a491a74b5d2b6ae9d09f2fc7c

See more details on using hashes here.

File details

Details for the file cisv-0.4.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 66fc6a050e165c95a4ddc981543ce32eabde803692d9b0d59877b707e54a3680
MD5 ff443f0037e12de07fc4248f708aaa54
BLAKE2b-256 1b7e4ae0e394ba68b37d0ecd0ac8a02608a583f60dd00cc57a99e5ac0fd3cb22

See more details on using hashes here.

File details

Details for the file cisv-0.4.3-cp311-cp311-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.3-cp311-cp311-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 49ee7ccd325be927efae8c36cb532eefea5a1d7a4b31aec7bbb0968cca343135
MD5 52b9881f022ad8fe4fbcfdcd3a864043
BLAKE2b-256 d06c750d71792f7cb170248d5042080c32834db231036e3ef7c57e73ff5655f6

See more details on using hashes here.

File details

Details for the file cisv-0.4.3-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.4.3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e156b53b6cd5fdf2980b81e185fd36a398893d5e3b0cba93229f400cc3b7731a
MD5 02ff1ab59380360de63ffd97bdaeaff3
BLAKE2b-256 24ac1bc87b7d33ff61d3715112a8d469ea3a66c47c92ed4fc9c6997c649d202f

See more details on using hashes here.

File details

Details for the file cisv-0.4.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 53617f761ad34cb3ed877ee89f35d4926ce01a691bb6633506de30fafbb54adc
MD5 cd2440dcbe3b549d5b3b85f987eebd4b
BLAKE2b-256 1ec4375ee8b6128d2aba2b6742db198beaa05a299ab3df165fd6ec2037d9bfdf

See more details on using hashes here.

File details

Details for the file cisv-0.4.3-cp310-cp310-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.3-cp310-cp310-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 abc583932180b944c969ecbfe983f1b1239b289a117ecfd80e1d067479c6c101
MD5 60bb289871352669547c7579d70f6476
BLAKE2b-256 1406a10a456167d5fe3a9c1cd98fe2e6b380096ed504f250ecf2ddc958768c52

See more details on using hashes here.

File details

Details for the file cisv-0.4.3-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.4.3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3365e30bf8244b07ba1af0a504afeaa8d7fcd33a3c8d0398780176a6a74d2eab
MD5 18ea4bd0fbcaeb973a41aea1fb73bcf7
BLAKE2b-256 b8017951a054bc51c2c1c05debac5271da414a8e95325674b03a6f723100a94f

See more details on using hashes here.

File details

Details for the file cisv-0.4.3-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.3-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c4303c4b42dacd359ca88830045ec4c04f1283a11b59e8d24d6a3c24162e83c7
MD5 0855e430427156ab38a97bf0ad57db43
BLAKE2b-256 b50ca84085ec02a38221bac8b6c16ba7427c787b06d64072fe627560b33ce844

See more details on using hashes here.

File details

Details for the file cisv-0.4.3-cp39-cp39-macosx_11_0_x86_64.whl.

File metadata

  • Download URL: cisv-0.4.3-cp39-cp39-macosx_11_0_x86_64.whl
  • Upload date:
  • Size: 75.2 kB
  • Tags: CPython 3.9, macOS 11.0+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cisv-0.4.3-cp39-cp39-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 5670fa6a2ca5e07cb496ffa963a2a748d65698d4926de69c3fd0be20efe8a182
MD5 0c784f11657259c680f4592c22fe66ff
BLAKE2b-256 ea5dadf6abd43792a1dfd693f7617882e084e1c386c79841356ab5b18297b9d3

See more details on using hashes here.

File details

Details for the file cisv-0.4.3-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

  • Download URL: cisv-0.4.3-cp39-cp39-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 71.7 kB
  • Tags: CPython 3.9, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cisv-0.4.3-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6a9109de21186f0758c2dfae77f13b76c508bb4aad8a11402b99913f9f9c6732
MD5 90d74b0ab3da03d460746a5f3610bb51
BLAKE2b-256 95babbea29e98cd56e90a2b2b458ce7d77f1e4427d050032288e3e45d517afc1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page