Skip to main content

High-performance CSV parser with SIMD optimizations

Project description

CISV Python Bindings (nanobind)

High-performance Python bindings for the CISV CSV parser using nanobind.

Performance

These bindings are 10-100x faster than the ctypes-based bindings because they:

  1. Use the batch API: All data is parsed in C and returned at once, eliminating millions of per-field callbacks
  2. Use nanobind: Much lower overhead than ctypes or pybind11
  3. Release the GIL: Parallel parsing runs without holding the Python GIL
File Size ctypes nanobind Speedup
142MB (1M rows × 10 cols) ~20s <0.8s 25x+

Installation

From PyPI (recommended)

pip install cisv

From source

cd bindings/python-nanobind
pip install .

Development install

cd bindings/python-nanobind
pip install -e .

Usage

import cisv

# Parse a file
rows = cisv.parse_file('data.csv')

# Parse with options
rows = cisv.parse_file(
    'data.csv',
    delimiter=';',
    quote="'",
    trim=True,
    skip_empty_lines=True
)

# Parse large files in parallel (faster on multi-core systems)
rows = cisv.parse_file('large.csv', parallel=True)

# Parse a string
rows = cisv.parse_string("a,b,c\n1,2,3")

# Count rows quickly (SIMD-accelerated)
count = cisv.count_rows('data.csv')

# Row-by-row iteration (memory efficient, supports early exit)
with cisv.CisvIterator('large.csv') as reader:
    for row in reader:
        print(row)  # List[str]
        if row[0] == 'stop':
            break  # Early exit - no wasted work

# Or use the convenience function
for row in cisv.open_iterator('data.csv', delimiter=',', trim=True):
    process(row)

API Reference

parse_file(path, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False, parallel=False, num_threads=0)

Parse a CSV file and return all rows.

Parameters:

  • path: Path to the CSV file
  • delimiter: Field delimiter character (default: ',')
  • quote: Quote character (default: '"')
  • trim: Whether to trim whitespace from fields
  • skip_empty_lines: Whether to skip empty lines
  • parallel: Use multi-threaded parsing (faster for large files)
  • num_threads: Number of threads for parallel parsing (0 = auto-detect)

Returns: List of rows, where each row is a list of field values.

parse_string(content, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False)

Parse a CSV string and return all rows.

Parameters:

  • content: CSV content as a string
  • delimiter: Field delimiter character (default: ',')
  • quote: Quote character (default: '"')
  • trim: Whether to trim whitespace from fields
  • skip_empty_lines: Whether to skip empty lines

Returns: List of rows, where each row is a list of field values.

count_rows(path)

Count the number of rows in a CSV file without full parsing.

This is very fast as it only scans for newlines using SIMD instructions.

Parameters:

  • path: Path to the CSV file

Returns: Number of rows in the file.

CisvIterator(path, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False)

Row-by-row iterator for streaming CSV parsing with minimal memory footprint.

Provides fgetcsv-style iteration that supports early exit - breaking out of iteration stops parsing immediately with no wasted work.

Parameters:

  • path: Path to the CSV file
  • delimiter: Field delimiter character (default: ',')
  • quote: Quote character (default: '"')
  • trim: Whether to trim whitespace from fields
  • skip_empty_lines: Whether to skip empty lines

Methods:

  • next(): Get the next row as List[str], or None if at end of file
  • close(): Close the iterator and release resources
  • closed: Property indicating whether the iterator has been closed

Protocols:

  • Iterator protocol: Use in for loops with for row in iterator
  • Context manager: Use with with statement for automatic cleanup

Example:

# Context manager (recommended)
with cisv.CisvIterator('data.csv') as reader:
    for row in reader:
        if row[0] == 'target':
            print(f"Found: {row}")
            break  # Early exit

# Manual iteration
reader = cisv.CisvIterator('data.csv')
try:
    while True:
        row = reader.next()
        if row is None:
            break
        process(row)
finally:
    reader.close()

open_iterator(path, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False)

Convenience function that returns a CisvIterator. Same parameters as CisvIterator.

Example:

for row in cisv.open_iterator('data.csv'):
    print(row)

Running Tests

cd bindings/python-nanobind
pip install -e ".[test]"
pytest

Benchmarking

pip install -e ".[benchmark]"
python -c "
import cisv
import time

# Create test file
with open('/tmp/test.csv', 'w') as f:
    f.write('col1,col2,col3\n')
    for i in range(100000):
        f.write(f'value{i}_1,value{i}_2,value{i}_3\n')

# Benchmark
start = time.time()
rows = cisv.parse_file('/tmp/test.csv')
print(f'Parsed {len(rows)} rows in {time.time()-start:.3f}s')
"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cisv-0.3.3.tar.gz (48.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cisv-0.3.3-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (108.0 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.3.3-cp313-cp313-macosx_11_0_arm64.whl (70.7 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cisv-0.3.3-cp313-cp313-macosx_10_14_x86_64.whl (74.3 kB view details)

Uploaded CPython 3.13macOS 10.14+ x86-64

cisv-0.3.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (108.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.3.3-cp312-cp312-macosx_11_0_arm64.whl (70.7 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cisv-0.3.3-cp312-cp312-macosx_10_14_x86_64.whl (74.3 kB view details)

Uploaded CPython 3.12macOS 10.14+ x86-64

cisv-0.3.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (108.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.3.3-cp311-cp311-macosx_11_0_arm64.whl (71.3 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cisv-0.3.3-cp311-cp311-macosx_10_14_x86_64.whl (74.6 kB view details)

Uploaded CPython 3.11macOS 10.14+ x86-64

cisv-0.3.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (108.7 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.3.3-cp310-cp310-macosx_11_0_arm64.whl (71.5 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

cisv-0.3.3-cp310-cp310-macosx_10_14_x86_64.whl (74.9 kB view details)

Uploaded CPython 3.10macOS 10.14+ x86-64

cisv-0.3.3-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (109.0 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.3.3-cp39-cp39-macosx_11_0_arm64.whl (71.7 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

cisv-0.3.3-cp39-cp39-macosx_10_14_x86_64.whl (75.1 kB view details)

Uploaded CPython 3.9macOS 10.14+ x86-64

File details

Details for the file cisv-0.3.3.tar.gz.

File metadata

  • Download URL: cisv-0.3.3.tar.gz
  • Upload date:
  • Size: 48.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cisv-0.3.3.tar.gz
Algorithm Hash digest
SHA256 c4b9633381194d4b9dd42f71bb41e45594e32391d98903c9d725aaad82ffaac9
MD5 742c9207b8267415ba99f3a200b37fdc
BLAKE2b-256 3265f03c2b7ad292b751de485a2a7545426b02fa876367a22b385126ce44af8c

See more details on using hashes here.

File details

Details for the file cisv-0.3.3-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.3.3-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2d4919aaafecb3927c3bb62f760e885f69f99ecfbc5de73d4053c188b6f1915f
MD5 12755bf29bc182afae06fbbb8bb0fa65
BLAKE2b-256 640c04bb24f57b1a4f673d5a76a5b01de58a3fd96d2f78e271ecaea3b0ba00c0

See more details on using hashes here.

File details

Details for the file cisv-0.3.3-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.3.3-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4884d9b5804a0c4b29caa63b1e0f74ff80a934102d8e9a98711444166acf2e36
MD5 b5cac90814dc0d0d6d44f2548e0945e1
BLAKE2b-256 842efd1ad51f7ad91b61c676706c21fbff12c534987958bf062cf29fc3070838

See more details on using hashes here.

File details

Details for the file cisv-0.3.3-cp313-cp313-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.3.3-cp313-cp313-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 61a40adca8e0036810c8adbdb0d887380cc71621a77bfc2acc6f921c069d1e47
MD5 4052fd3770ce21c64fbbc67c7c069b7f
BLAKE2b-256 ab6b78cd8694221b10a11c0d7113fbde03ac768beec70f0a70d8ce511af5f212

See more details on using hashes here.

File details

Details for the file cisv-0.3.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.3.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 cd097ae1dc4fcd000f9e813624db6830be2c653f05ac566434e53c3b847d484d
MD5 3080363dd5cc084c201dfe142cd020ad
BLAKE2b-256 f17de92c8e9cf624a6e3dfcaa4b73979b71de67933e72b894c19021aa1fe36e8

See more details on using hashes here.

File details

Details for the file cisv-0.3.3-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.3.3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f592d3ed612cae64ef29c531d5329e9a9a016dd4406c59253286c785f5ae675d
MD5 38ab2a846c629b76b6806335de4b9f17
BLAKE2b-256 5c8c830ef135b024131413d313130319addc9a3174b94598990858602005ce13

See more details on using hashes here.

File details

Details for the file cisv-0.3.3-cp312-cp312-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.3.3-cp312-cp312-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 d5c3e8c881d2eda7558ca2dd812bd7b3ab613c89007f837e9598adfb3a319739
MD5 a5df32c1edec98a4fe3e4845286d9d1c
BLAKE2b-256 1df1fb302c2e0c7f03811648a2ad87e2842f075756b0bfa9d2a5119f4aca035e

See more details on using hashes here.

File details

Details for the file cisv-0.3.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.3.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 eff24ccdd7ba3341c68ab762768802017858b6b6078e0f493c90bc4ee4b4b72c
MD5 0bf00dec6bb9c1538dff30a39eafadb8
BLAKE2b-256 2d2158fd20821acbd522d79879cd75dffcccb0ac46c4b55a961a7165f1a31bdb

See more details on using hashes here.

File details

Details for the file cisv-0.3.3-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.3.3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8c8821c4fa22226dc476217af611bf8c11c706b041db48bbd2fe9cd5af0087d0
MD5 31f524d43575dc7d7a3f88864c71271b
BLAKE2b-256 4af71c7aadfe6dad8259ec101ad17d25184bccfb84902cdb8fb7d2ad2c2579d0

See more details on using hashes here.

File details

Details for the file cisv-0.3.3-cp311-cp311-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.3.3-cp311-cp311-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 496c5110f6154174cb9dda3ca5bccae639c6a433d455d73983fc1a67e7521588
MD5 1c6b6a94d6dd724686563ddcbf7be255
BLAKE2b-256 c2c6c606adade4b3a4bbb8d42209be71be5211bfd997de89ef4c64851f7095a5

See more details on using hashes here.

File details

Details for the file cisv-0.3.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.3.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9d0a281c75569e035b13c473fc18e37e0d1a9f821f346ebec89d766c480cec83
MD5 2b3fa4ab786db119fdc831f76c72945e
BLAKE2b-256 4e7cfa867382c334221ed4fdfdd75affb61e648bdf0a3d5f6045b472b11a842d

See more details on using hashes here.

File details

Details for the file cisv-0.3.3-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.3.3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 07206a51e7d490f55345308ca8d0c88719565af6cfd7b16dbd3d99e61dfd4088
MD5 7f7b259188f971a4e600b6a5e722e5f2
BLAKE2b-256 4ac429979bd0103d84535eeae4f59cf8db10fd31239332cb7a80a533e0eb5915

See more details on using hashes here.

File details

Details for the file cisv-0.3.3-cp310-cp310-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.3.3-cp310-cp310-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 b5b77e3116286147b685940904a22ea43b7a4faa064b6deae8fe8ea3b33c3c2f
MD5 1af940bfcab83f9ed35c3c0f0eeb75f1
BLAKE2b-256 e373ec2ce29225472d81bb1ce1ea48262253163895c4a4fca29ec66479058176

See more details on using hashes here.

File details

Details for the file cisv-0.3.3-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.3.3-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 bc8afc04678fc7faa84f0d889ec54c1307c0bbe2a1e6c74932bd7411f9455e1c
MD5 26a4c7b51c7fd3e6734cd370a74b458a
BLAKE2b-256 331a3e6da59ce498fdbdbdf5fb95b88ef0ead84109fcaab444b8fe0108da1657

See more details on using hashes here.

File details

Details for the file cisv-0.3.3-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

  • Download URL: cisv-0.3.3-cp39-cp39-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 71.7 kB
  • Tags: CPython 3.9, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cisv-0.3.3-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 28671a77eb8a8ce0ebe836c613e489b577f9f9e472cdcf9fcbc6a50254e1ff61
MD5 b2a7f9e783a3b020f00e47847a9bd71b
BLAKE2b-256 a70d455b3784555f1c72f069c1f69940db8efd1beca525da97faca06a9650095

See more details on using hashes here.

File details

Details for the file cisv-0.3.3-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.3.3-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 9f3298b5fc69347e38a7e77edb40fce609f7a7ecbe1fd2d952fe1038b50f344d
MD5 3226e03d8ff3502cdfac21fb8b937988
BLAKE2b-256 7276065c9aea8e7256db41dbceb44db657207eece58348fe5b3d775008946305

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page