Skip to main content

High-performance CSV parser with SIMD optimizations

Project description

CISV Python Bindings (nanobind)

High-performance Python bindings for the CISV CSV parser using nanobind.

Performance

These bindings are 10-100x faster than the ctypes-based bindings because they:

  1. Use the batch API: All data is parsed in C and returned at once, eliminating millions of per-field callbacks
  2. Use nanobind: Much lower overhead than ctypes or pybind11
  3. Release the GIL: Parallel parsing runs without holding the Python GIL
File Size ctypes nanobind Speedup
142MB (1M rows × 10 cols) ~20s <0.8s 25x+

Installation

From PyPI (recommended)

pip install cisv

From source

cd bindings/python-nanobind
pip install .

Development install

cd bindings/python-nanobind
pip install -e .

Usage

import cisv

# Parse a file
rows = cisv.parse_file('data.csv')

# Parse with options
rows = cisv.parse_file(
    'data.csv',
    delimiter=';',
    quote="'",
    trim=True,
    skip_empty_lines=True
)

# Parse large files in parallel (faster on multi-core systems)
rows = cisv.parse_file('large.csv', parallel=True)

# Parse a string
rows = cisv.parse_string("a,b,c\n1,2,3")

# Count rows quickly (SIMD-accelerated)
count = cisv.count_rows('data.csv')

# Row-by-row iteration (memory efficient, supports early exit)
with cisv.CisvIterator('large.csv') as reader:
    for row in reader:
        print(row)  # List[str]
        if row[0] == 'stop':
            break  # Early exit - no wasted work

# Or use the convenience function
for row in cisv.open_iterator('data.csv', delimiter=',', trim=True):
    process(row)

API Reference

parse_file(path, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False, parallel=False, num_threads=0)

Parse a CSV file and return all rows.

Parameters:

  • path: Path to the CSV file
  • delimiter: Field delimiter character (default: ',')
  • quote: Quote character (default: '"')
  • trim: Whether to trim whitespace from fields
  • skip_empty_lines: Whether to skip empty lines
  • parallel: Use multi-threaded parsing (faster for large files)
  • num_threads: Number of threads for parallel parsing (0 = auto-detect)

Returns: List of rows, where each row is a list of field values.

parse_string(content, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False)

Parse a CSV string and return all rows.

Parameters:

  • content: CSV content as a string
  • delimiter: Field delimiter character (default: ',')
  • quote: Quote character (default: '"')
  • trim: Whether to trim whitespace from fields
  • skip_empty_lines: Whether to skip empty lines

Returns: List of rows, where each row is a list of field values.

count_rows(path)

Count the number of rows in a CSV file without full parsing.

This is very fast as it only scans for newlines using SIMD instructions.

Parameters:

  • path: Path to the CSV file

Returns: Number of rows in the file.

CisvIterator(path, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False)

Row-by-row iterator for streaming CSV parsing with minimal memory footprint.

Provides fgetcsv-style iteration that supports early exit - breaking out of iteration stops parsing immediately with no wasted work.

Parameters:

  • path: Path to the CSV file
  • delimiter: Field delimiter character (default: ',')
  • quote: Quote character (default: '"')
  • trim: Whether to trim whitespace from fields
  • skip_empty_lines: Whether to skip empty lines

Methods:

  • next(): Get the next row as List[str], or None if at end of file
  • close(): Close the iterator and release resources
  • closed: Property indicating whether the iterator has been closed

Protocols:

  • Iterator protocol: Use in for loops with for row in iterator
  • Context manager: Use with with statement for automatic cleanup

Example:

# Context manager (recommended)
with cisv.CisvIterator('data.csv') as reader:
    for row in reader:
        if row[0] == 'target':
            print(f"Found: {row}")
            break  # Early exit

# Manual iteration
reader = cisv.CisvIterator('data.csv')
try:
    while True:
        row = reader.next()
        if row is None:
            break
        process(row)
finally:
    reader.close()

open_iterator(path, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False)

Convenience function that returns a CisvIterator. Same parameters as CisvIterator.

Example:

for row in cisv.open_iterator('data.csv'):
    print(row)

Running Tests

cd bindings/python-nanobind
pip install -e ".[test]"
pytest

Benchmarking

pip install -e ".[benchmark]"
python -c "
import cisv
import time

# Create test file
with open('/tmp/test.csv', 'w') as f:
    f.write('col1,col2,col3\n')
    for i in range(100000):
        f.write(f'value{i}_1,value{i}_2,value{i}_3\n')

# Benchmark
start = time.time()
rows = cisv.parse_file('/tmp/test.csv')
print(f'Parsed {len(rows)} rows in {time.time()-start:.3f}s')
"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cisv-0.4.5.tar.gz (48.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cisv-0.4.5-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (108.0 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.4.5-cp313-cp313-macosx_11_0_x86_64.whl (74.4 kB view details)

Uploaded CPython 3.13macOS 11.0+ x86-64

cisv-0.4.5-cp313-cp313-macosx_11_0_arm64.whl (70.7 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cisv-0.4.5-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (108.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.4.5-cp312-cp312-macosx_11_0_x86_64.whl (74.4 kB view details)

Uploaded CPython 3.12macOS 11.0+ x86-64

cisv-0.4.5-cp312-cp312-macosx_11_0_arm64.whl (70.7 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cisv-0.4.5-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (108.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.4.5-cp311-cp311-macosx_11_0_x86_64.whl (74.8 kB view details)

Uploaded CPython 3.11macOS 11.0+ x86-64

cisv-0.4.5-cp311-cp311-macosx_11_0_arm64.whl (71.3 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cisv-0.4.5-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (108.7 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.4.5-cp310-cp310-macosx_11_0_x86_64.whl (75.0 kB view details)

Uploaded CPython 3.10macOS 11.0+ x86-64

cisv-0.4.5-cp310-cp310-macosx_11_0_arm64.whl (71.5 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

cisv-0.4.5-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (109.0 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.4.5-cp39-cp39-macosx_11_0_x86_64.whl (75.2 kB view details)

Uploaded CPython 3.9macOS 11.0+ x86-64

cisv-0.4.5-cp39-cp39-macosx_11_0_arm64.whl (71.7 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file cisv-0.4.5.tar.gz.

File metadata

  • Download URL: cisv-0.4.5.tar.gz
  • Upload date:
  • Size: 48.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cisv-0.4.5.tar.gz
Algorithm Hash digest
SHA256 666b103e9703c8b1e48798ff2663fa8c1ba67ea705de6753b5827d0a182ddfb8
MD5 307aeca08385477717472e6268d0068a
BLAKE2b-256 376ca38ee21e74a6d611a478739270055a8af93aabee865ae8ab2ae1b4f44fc4

See more details on using hashes here.

File details

Details for the file cisv-0.4.5-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.5-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 fc9ce8cab61bfe41dbc7b5320a76c4edd682d6c3757774cd25451502629734d6
MD5 aefe7bc50f77b98051d15dcfe9ed8cbe
BLAKE2b-256 c3659d51a59b0fa87c1b584b9d9c868dec1d063ae3b9f1ad4b88200cca3e2c2f

See more details on using hashes here.

File details

Details for the file cisv-0.4.5-cp313-cp313-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.5-cp313-cp313-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 91671003f52298907d62aadd7ba0f1df51d235f00c17d6d7aca21bdba2582254
MD5 7f94220e6527523feb88c3682522b9fe
BLAKE2b-256 67674a7509e40c092c3a96443c9f156d0b73dd4aeab90776ec0141bc33d6ad4c

See more details on using hashes here.

File details

Details for the file cisv-0.4.5-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.4.5-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a345d628a0265e1acd5a2efeb579494a18912517c7cd825f7b7abd43210d7290
MD5 f35477b831b3b85e10eb24e9c5a3f32e
BLAKE2b-256 39ec87c939301c633f7d57c3c1140664bc28e35198741c9153c45a7fa6b03d81

See more details on using hashes here.

File details

Details for the file cisv-0.4.5-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.5-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 724a6ddf375a83d441c0ef81472364328588316c7079ed2c65fcf8fe07f1ab94
MD5 d16b57e88f4e1b23190567b500224cef
BLAKE2b-256 e41669a4fffba6eb43cbbfa7a81385213866e6d42c251bad3716c38112bd6dc0

See more details on using hashes here.

File details

Details for the file cisv-0.4.5-cp312-cp312-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.5-cp312-cp312-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 c1ddfdc33b1b60147cf5bf08ecb8fa2abed60b90371dac1be7106e6fdb524c22
MD5 b9bc81a1ef0a9b2562a34d9d025588ad
BLAKE2b-256 af2e7e6623ad7b53b20010b962f696a86e5800a98116892d84d7c710b6a07949

See more details on using hashes here.

File details

Details for the file cisv-0.4.5-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.4.5-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ee0ae50e1ae0f7cbddf9f4435cb227d7273f0f07a1e389ce563116655315691c
MD5 44ea52c1add693ed41f08a3bf2d607e5
BLAKE2b-256 3de5a44e50e871e14de6bdb635230c5619126568791de6ab1939504df8295bb4

See more details on using hashes here.

File details

Details for the file cisv-0.4.5-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.5-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 723ca1e9bc71162b2604215a97b212a7b8c3b8da54038673c990a4053d43a540
MD5 8c95acfbc8c5b2cdbebf24699e3016fc
BLAKE2b-256 3252b33b463f97e014005dd94ef5c2825bf43d516060fb0ef7b3d78acfbe7e0e

See more details on using hashes here.

File details

Details for the file cisv-0.4.5-cp311-cp311-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.5-cp311-cp311-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 35cf3a6b7de317e1d311b4c57000c3483a22f6719e94571ca955b78faa4eaafd
MD5 ac29858b68a070fd383984d0c8dd3a90
BLAKE2b-256 57fc598009230f2c3e89973b14a3dfca8baa5c63524f83da5126d3ddafc55c85

See more details on using hashes here.

File details

Details for the file cisv-0.4.5-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.4.5-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 795d959d38fffa36daeee0e083e154763b140b09f82f8b8192745b5c3c35c12c
MD5 60c558e9cdc393a1799fb1cbc6aba20f
BLAKE2b-256 68d677faad783b3b7f404eff4cf32d968deb30f49e71ef71bc03a6c0a315b4f8

See more details on using hashes here.

File details

Details for the file cisv-0.4.5-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.5-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 62d00bb6ec3534d9217a9c965dbed5d321d1ec5609fcac6ec96d0d08e1396c39
MD5 0e299d4a37d31e2ec86bee26effbd650
BLAKE2b-256 6691d95b3b3875ab07fc37409a38468d6c73eb6d48a21696777cc41680eaa464

See more details on using hashes here.

File details

Details for the file cisv-0.4.5-cp310-cp310-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.5-cp310-cp310-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 5e8efa7c47f6f4ba0d17d746363fed60d8f7a767e3fd52df6e84be8f58b938f2
MD5 be183718cd8623312a6ed0e53f3413e2
BLAKE2b-256 4ec849df74e70ec3253322d0678fc993a0cfe5558e18488611fef66e063eab00

See more details on using hashes here.

File details

Details for the file cisv-0.4.5-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.4.5-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0d90e30eee3a1c746ed637e73f76ee33f89c90b6b59fca4deddd35e5576ef8d6
MD5 800cdb6609718e586f679f934326cb2f
BLAKE2b-256 7231bc9bb864b01529b9156bfc44c08b316bbf7edef5f7d43a58b1d7ad9942ef

See more details on using hashes here.

File details

Details for the file cisv-0.4.5-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.5-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e3d2fdef926ec2efa1747406786547e2695c8b67157b0e3d8a0d868aaa4b0676
MD5 451d2bf2504c11b255e7eab64940ec76
BLAKE2b-256 e7698bd8d1820a532bcbd0d696177ebed62e53f15c4ce4fe7fcf6929279ef6a2

See more details on using hashes here.

File details

Details for the file cisv-0.4.5-cp39-cp39-macosx_11_0_x86_64.whl.

File metadata

  • Download URL: cisv-0.4.5-cp39-cp39-macosx_11_0_x86_64.whl
  • Upload date:
  • Size: 75.2 kB
  • Tags: CPython 3.9, macOS 11.0+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cisv-0.4.5-cp39-cp39-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 f0422b734ebfde90dc1780af7f1d9854e5ed3ea5f873e69fd049d99f3d83b382
MD5 32e9dcecd5cdbf4047fd631e5423bde8
BLAKE2b-256 bdd41e3f0ef52068dda9c8830bf8a468a4745f175ad8df7cae48729464215829

See more details on using hashes here.

File details

Details for the file cisv-0.4.5-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

  • Download URL: cisv-0.4.5-cp39-cp39-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 71.7 kB
  • Tags: CPython 3.9, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cisv-0.4.5-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c406e5aea5f375472d1da0acfeee206410fbf5b6123b8a2885f4d638c3ce6a84
MD5 aefac5f33c5b75de874359f4bc2c91cc
BLAKE2b-256 66bf9e6d338a006bc348f277d5e1b9d31eb32f0c76620f8b37dfa9b14c047e80

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page