Skip to main content

High-performance CSV parser with SIMD optimizations

Project description

CISV Python Bindings (nanobind)

High-performance Python bindings for the CISV CSV parser using nanobind.

Performance

These bindings are 10-100x faster than the ctypes-based bindings because they:

  1. Use the batch API: All data is parsed in C and returned at once, eliminating millions of per-field callbacks
  2. Use nanobind: Much lower overhead than ctypes or pybind11
  3. Release the GIL: Parallel parsing runs without holding the Python GIL
File Size ctypes nanobind Speedup
142MB (1M rows × 10 cols) ~20s <0.8s 25x+

Installation

From PyPI (recommended)

pip install cisv

From source

cd bindings/python-nanobind
pip install .

Development install

cd bindings/python-nanobind
pip install -e .

Usage

import cisv

# Parse a file
rows = cisv.parse_file('data.csv')

# Parse with options
rows = cisv.parse_file(
    'data.csv',
    delimiter=';',
    quote="'",
    trim=True,
    skip_empty_lines=True
)

# Parse large files in parallel (faster on multi-core systems)
rows = cisv.parse_file('large.csv', parallel=True)

# Parse a string
rows = cisv.parse_string("a,b,c\n1,2,3")

# Count rows quickly (SIMD-accelerated)
count = cisv.count_rows('data.csv')

# Row-by-row iteration (memory efficient, supports early exit)
with cisv.CisvIterator('large.csv') as reader:
    for row in reader:
        print(row)  # List[str]
        if row[0] == 'stop':
            break  # Early exit - no wasted work

# Or use the convenience function
for row in cisv.open_iterator('data.csv', delimiter=',', trim=True):
    process(row)

API Reference

parse_file(path, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False, parallel=False, num_threads=0)

Parse a CSV file and return all rows.

Parameters:

  • path: Path to the CSV file
  • delimiter: Field delimiter character (default: ',')
  • quote: Quote character (default: '"')
  • trim: Whether to trim whitespace from fields
  • skip_empty_lines: Whether to skip empty lines
  • parallel: Use multi-threaded parsing (faster for large files)
  • num_threads: Number of threads for parallel parsing (0 = auto-detect)

Returns: List of rows, where each row is a list of field values.

parse_string(content, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False)

Parse a CSV string and return all rows.

Parameters:

  • content: CSV content as a string
  • delimiter: Field delimiter character (default: ',')
  • quote: Quote character (default: '"')
  • trim: Whether to trim whitespace from fields
  • skip_empty_lines: Whether to skip empty lines

Returns: List of rows, where each row is a list of field values.

count_rows(path)

Count the number of rows in a CSV file without full parsing.

This is very fast as it only scans for newlines using SIMD instructions.

Parameters:

  • path: Path to the CSV file

Returns: Number of rows in the file.

CisvIterator(path, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False)

Row-by-row iterator for streaming CSV parsing with minimal memory footprint.

Provides fgetcsv-style iteration that supports early exit - breaking out of iteration stops parsing immediately with no wasted work.

Parameters:

  • path: Path to the CSV file
  • delimiter: Field delimiter character (default: ',')
  • quote: Quote character (default: '"')
  • trim: Whether to trim whitespace from fields
  • skip_empty_lines: Whether to skip empty lines

Methods:

  • next(): Get the next row as List[str], or None if at end of file
  • close(): Close the iterator and release resources
  • closed: Property indicating whether the iterator has been closed

Protocols:

  • Iterator protocol: Use in for loops with for row in iterator
  • Context manager: Use with with statement for automatic cleanup

Example:

# Context manager (recommended)
with cisv.CisvIterator('data.csv') as reader:
    for row in reader:
        if row[0] == 'target':
            print(f"Found: {row}")
            break  # Early exit

# Manual iteration
reader = cisv.CisvIterator('data.csv')
try:
    while True:
        row = reader.next()
        if row is None:
            break
        process(row)
finally:
    reader.close()

open_iterator(path, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False)

Convenience function that returns a CisvIterator. Same parameters as CisvIterator.

Example:

for row in cisv.open_iterator('data.csv'):
    print(row)

Running Tests

cd bindings/python-nanobind
pip install -e ".[test]"
pytest

Benchmarking

pip install -e ".[benchmark]"
python -c "
import cisv
import time

# Create test file
with open('/tmp/test.csv', 'w') as f:
    f.write('col1,col2,col3\n')
    for i in range(100000):
        f.write(f'value{i}_1,value{i}_2,value{i}_3\n')

# Benchmark
start = time.time()
rows = cisv.parse_file('/tmp/test.csv')
print(f'Parsed {len(rows)} rows in {time.time()-start:.3f}s')
"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cisv-0.2.5.tar.gz (47.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cisv-0.2.5-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (105.6 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.2.5-cp313-cp313-macosx_11_0_arm64.whl (68.2 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cisv-0.2.5-cp313-cp313-macosx_10_14_x86_64.whl (73.0 kB view details)

Uploaded CPython 3.13macOS 10.14+ x86-64

cisv-0.2.5-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (105.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.2.5-cp312-cp312-macosx_11_0_arm64.whl (68.2 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cisv-0.2.5-cp312-cp312-macosx_10_14_x86_64.whl (73.0 kB view details)

Uploaded CPython 3.12macOS 10.14+ x86-64

cisv-0.2.5-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (106.2 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.2.5-cp311-cp311-macosx_11_0_arm64.whl (69.1 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cisv-0.2.5-cp311-cp311-macosx_10_14_x86_64.whl (73.5 kB view details)

Uploaded CPython 3.11macOS 10.14+ x86-64

cisv-0.2.5-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (106.4 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.2.5-cp310-cp310-macosx_11_0_arm64.whl (69.3 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

cisv-0.2.5-cp310-cp310-macosx_10_14_x86_64.whl (73.7 kB view details)

Uploaded CPython 3.10macOS 10.14+ x86-64

cisv-0.2.5-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (106.6 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.2.5-cp39-cp39-macosx_11_0_arm64.whl (69.5 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

cisv-0.2.5-cp39-cp39-macosx_10_14_x86_64.whl (74.0 kB view details)

Uploaded CPython 3.9macOS 10.14+ x86-64

File details

Details for the file cisv-0.2.5.tar.gz.

File metadata

  • Download URL: cisv-0.2.5.tar.gz
  • Upload date:
  • Size: 47.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cisv-0.2.5.tar.gz
Algorithm Hash digest
SHA256 83b19330b30514a85fd2b1ff035d9086b05c6ff36b600920f7e3a17fa196482a
MD5 e831108338ef14b47e6de8971186c4d1
BLAKE2b-256 00fb9b6e3c9b696ace02605b6db06959d850b0fc10b80b0bce9fd0beda82eca6

See more details on using hashes here.

File details

Details for the file cisv-0.2.5-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.2.5-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4fa1680079a8c17ae4240061a9777f1b8e4119c8934e71bc246b2bf3654b3f99
MD5 c15c67403b59873e4095fa1c56e43a40
BLAKE2b-256 aea463c735051a86fdbc408fc4d173d7f747ec28dc93da230613bbe650ea07d9

See more details on using hashes here.

File details

Details for the file cisv-0.2.5-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.2.5-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 14e6633db833ece02e9fc7a736b6136d4584a6cc31fd90bc531158ffe5a18a62
MD5 098cf4c6370c7c8198d86e5b739e5e5f
BLAKE2b-256 b607e0f9376cb47de3e4af154b1bdecbe2b219331deb7d3ae7686cf0cf596308

See more details on using hashes here.

File details

Details for the file cisv-0.2.5-cp313-cp313-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.2.5-cp313-cp313-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 59b1bd144329b9d2732f504d0ef05930af86d8ab0ccf41a7c4d82e563cd01038
MD5 cd9a5f7517643f884b24b654728acc81
BLAKE2b-256 a68d0f6b0118cc2fa3382fa61d791d0115954200e77c8301149050c117842a6c

See more details on using hashes here.

File details

Details for the file cisv-0.2.5-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.2.5-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8a010d35baef1b416c5bd8e80b9d80fc702623951a7785b7d1d4b0ba55e332a7
MD5 8fe9f6e43b50a5d94756021fae16542a
BLAKE2b-256 50a42be59cf44626c29e08b4c636b05cb2e379090729120725c3487f70430052

See more details on using hashes here.

File details

Details for the file cisv-0.2.5-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.2.5-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8a7bceeeabe692665f44ef5128d09996f3150ec9dc3a531e31ff7630503f7e85
MD5 15cd05be9b3b05c72c32f407e4b91fe4
BLAKE2b-256 0458b521dce7f409c2da05fab7127e9db985fc8df764e707f6393f2639579cc2

See more details on using hashes here.

File details

Details for the file cisv-0.2.5-cp312-cp312-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.2.5-cp312-cp312-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 aceaea7d03c91e7bd05b6146996f2be21db905c409241330319cc1f581a74d2f
MD5 02ed4314b65f0e7fbb99fc4bffc375fb
BLAKE2b-256 fa6cf5c456a799366bdf782ef7053d325f8272a16405cc43d0f9d3ac8f6fc19e

See more details on using hashes here.

File details

Details for the file cisv-0.2.5-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.2.5-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1b32f0178fe0bbf9c163902a11a2ca3635fef6a81dae111d9afe1f9c9b073f34
MD5 c1ff3f0c83cc22f9c4538e17b4c58ead
BLAKE2b-256 4e9707ac1128d34dd1043a867ccf5b62925669fa678eef4c59b49747d01bc27f

See more details on using hashes here.

File details

Details for the file cisv-0.2.5-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.2.5-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 eb90f8e3c080e782561b1f17e326bbc63c24d6c6eae5c55d87bd4dfc20b9ed3e
MD5 f4cf062b187a9b26524c268312c574a5
BLAKE2b-256 36f182a9fb69a25670eba11d361db023c94d2f31c8f1491ac2b1bb61a0109bb5

See more details on using hashes here.

File details

Details for the file cisv-0.2.5-cp311-cp311-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.2.5-cp311-cp311-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 7a59998fce291525e4badc6ba63c5662bf4a0567df62b747c10aa62fe7577c98
MD5 ce404af037d7145a1845ab194c1d6d5b
BLAKE2b-256 a8efcbba39fcbbcd46725c8c97dabbc27faef204a1abe48511c6e472c77704ea

See more details on using hashes here.

File details

Details for the file cisv-0.2.5-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.2.5-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 087d7a1e38de99d711a21f42bf49aa21e5595b36ed3664037bf94da6a286a389
MD5 43fc5906abc68d6825cc1f31792bea69
BLAKE2b-256 f25e709e8e49b8d531bd629e77bfcb30af0c66ff107ef0985bc8525a290cabad

See more details on using hashes here.

File details

Details for the file cisv-0.2.5-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.2.5-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 18bdef7eedf2baae333a7aba7122bc1939c8627253f7a1694c8a8322707fd87f
MD5 63527a53a46a427670fc5ad90bba3b88
BLAKE2b-256 972c58eb7f72ce7a2c8bc88370afefc19c51eac5e026e735da297aa4ecef9bcc

See more details on using hashes here.

File details

Details for the file cisv-0.2.5-cp310-cp310-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.2.5-cp310-cp310-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 e1d651e7e74d3b53a737c01b36d49aa3efe962e16598751cebf95f2e9c27e7db
MD5 d3a40fd1468047c04f5e6bb64a4dccb8
BLAKE2b-256 0c1fc533af23778a975a4cdf2aaa46cb12c3724f9c1e5987340182c49f0fb89d

See more details on using hashes here.

File details

Details for the file cisv-0.2.5-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.2.5-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8347b53c1ea0feb1a99ab03f6e5c37881edbac24726713c5901a87e28993af8d
MD5 f987742b440c31ff803d8844e30065d9
BLAKE2b-256 969dd7fd72f85841dfe7c1b794542baff6be09716e743cd958225f6135346b00

See more details on using hashes here.

File details

Details for the file cisv-0.2.5-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

  • Download URL: cisv-0.2.5-cp39-cp39-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 69.5 kB
  • Tags: CPython 3.9, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cisv-0.2.5-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e3ec791d7100d59b803ddbdf1525e2089fe4db7c993eef1b6389db2c216cddd4
MD5 a0faed55c1d79f9f9cba3ae45cf6b96b
BLAKE2b-256 19d942bfba30e348fc165d9aba34ad46372530bb7c34b1f63427020c46ea82c0

See more details on using hashes here.

File details

Details for the file cisv-0.2.5-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.2.5-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 a2fa0e96a5b985cf38f1ec312980f783eac8109c7ff18be558e46412e5c11161
MD5 1f5ddc99a49f27076997cfa4b36392ec
BLAKE2b-256 bd3a56cf986c651ace520d1bd1dbd1b709fd3965d614a504cf949f0237e92bbb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page