Skip to main content

High-performance CSV parser with SIMD optimizations

Project description

CISV Python Bindings (nanobind)

High-performance Python bindings for the CISV CSV parser using nanobind.

Performance

These bindings are 10-100x faster than the ctypes-based bindings because they:

  1. Use the batch API: All data is parsed in C and returned at once, eliminating millions of per-field callbacks
  2. Use nanobind: Much lower overhead than ctypes or pybind11
  3. Release the GIL: Parallel parsing runs without holding the Python GIL
File Size ctypes nanobind Speedup
142MB (1M rows × 10 cols) ~20s <0.8s 25x+

Installation

From PyPI (recommended)

pip install cisv

From source

cd bindings/python-nanobind
pip install .

Development install

cd bindings/python-nanobind
pip install -e .

Usage

import cisv

# Parse a file
rows = cisv.parse_file('data.csv')

# Parse with options
rows = cisv.parse_file(
    'data.csv',
    delimiter=';',
    quote="'",
    trim=True,
    skip_empty_lines=True
)

# Parse large files in parallel (faster on multi-core systems)
rows = cisv.parse_file('large.csv', parallel=True)

# Parse a string
rows = cisv.parse_string("a,b,c\n1,2,3")

# Count rows quickly (SIMD-accelerated)
count = cisv.count_rows('data.csv')

# Row-by-row iteration (memory efficient, supports early exit)
with cisv.CisvIterator('large.csv') as reader:
    for row in reader:
        print(row)  # List[str]
        if row[0] == 'stop':
            break  # Early exit - no wasted work

# Or use the convenience function
for row in cisv.open_iterator('data.csv', delimiter=',', trim=True):
    process(row)

API Reference

parse_file(path, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False, parallel=False, num_threads=0)

Parse a CSV file and return all rows.

Parameters:

  • path: Path to the CSV file
  • delimiter: Field delimiter character (default: ',')
  • quote: Quote character (default: '"')
  • trim: Whether to trim whitespace from fields
  • skip_empty_lines: Whether to skip empty lines
  • parallel: Use multi-threaded parsing (faster for large files)
  • num_threads: Number of threads for parallel parsing (0 = auto-detect)

Returns: List of rows, where each row is a list of field values.

parse_string(content, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False)

Parse a CSV string and return all rows.

Parameters:

  • content: CSV content as a string
  • delimiter: Field delimiter character (default: ',')
  • quote: Quote character (default: '"')
  • trim: Whether to trim whitespace from fields
  • skip_empty_lines: Whether to skip empty lines

Returns: List of rows, where each row is a list of field values.

count_rows(path)

Count the number of rows in a CSV file without full parsing.

This is very fast as it only scans for newlines using SIMD instructions.

Parameters:

  • path: Path to the CSV file

Returns: Number of rows in the file.

CisvIterator(path, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False)

Row-by-row iterator for streaming CSV parsing with minimal memory footprint.

Provides fgetcsv-style iteration that supports early exit - breaking out of iteration stops parsing immediately with no wasted work.

Parameters:

  • path: Path to the CSV file
  • delimiter: Field delimiter character (default: ',')
  • quote: Quote character (default: '"')
  • trim: Whether to trim whitespace from fields
  • skip_empty_lines: Whether to skip empty lines

Methods:

  • next(): Get the next row as List[str], or None if at end of file
  • close(): Close the iterator and release resources
  • closed: Property indicating whether the iterator has been closed

Protocols:

  • Iterator protocol: Use in for loops with for row in iterator
  • Context manager: Use with with statement for automatic cleanup

Example:

# Context manager (recommended)
with cisv.CisvIterator('data.csv') as reader:
    for row in reader:
        if row[0] == 'target':
            print(f"Found: {row}")
            break  # Early exit

# Manual iteration
reader = cisv.CisvIterator('data.csv')
try:
    while True:
        row = reader.next()
        if row is None:
            break
        process(row)
finally:
    reader.close()

open_iterator(path, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False)

Convenience function that returns a CisvIterator. Same parameters as CisvIterator.

Example:

for row in cisv.open_iterator('data.csv'):
    print(row)

Running Tests

cd bindings/python-nanobind
pip install -e ".[test]"
pytest

Benchmarking

pip install -e ".[benchmark]"
python -c "
import cisv
import time

# Create test file
with open('/tmp/test.csv', 'w') as f:
    f.write('col1,col2,col3\n')
    for i in range(100000):
        f.write(f'value{i}_1,value{i}_2,value{i}_3\n')

# Benchmark
start = time.time()
rows = cisv.parse_file('/tmp/test.csv')
print(f'Parsed {len(rows)} rows in {time.time()-start:.3f}s')
"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cisv-0.2.4.tar.gz (47.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cisv-0.2.4-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (105.6 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.2.4-cp313-cp313-macosx_11_0_arm64.whl (68.2 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cisv-0.2.4-cp313-cp313-macosx_10_14_x86_64.whl (73.0 kB view details)

Uploaded CPython 3.13macOS 10.14+ x86-64

cisv-0.2.4-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (105.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.2.4-cp312-cp312-macosx_11_0_arm64.whl (68.2 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cisv-0.2.4-cp312-cp312-macosx_10_14_x86_64.whl (73.0 kB view details)

Uploaded CPython 3.12macOS 10.14+ x86-64

cisv-0.2.4-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (106.2 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.2.4-cp311-cp311-macosx_11_0_arm64.whl (69.1 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cisv-0.2.4-cp311-cp311-macosx_10_14_x86_64.whl (73.5 kB view details)

Uploaded CPython 3.11macOS 10.14+ x86-64

cisv-0.2.4-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (106.4 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.2.4-cp310-cp310-macosx_11_0_arm64.whl (69.3 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

cisv-0.2.4-cp310-cp310-macosx_10_14_x86_64.whl (73.7 kB view details)

Uploaded CPython 3.10macOS 10.14+ x86-64

cisv-0.2.4-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (106.6 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.2.4-cp39-cp39-macosx_11_0_arm64.whl (69.5 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

cisv-0.2.4-cp39-cp39-macosx_10_14_x86_64.whl (74.0 kB view details)

Uploaded CPython 3.9macOS 10.14+ x86-64

File details

Details for the file cisv-0.2.4.tar.gz.

File metadata

  • Download URL: cisv-0.2.4.tar.gz
  • Upload date:
  • Size: 47.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cisv-0.2.4.tar.gz
Algorithm Hash digest
SHA256 58b8160fc92d4b34631b9214972d28863f8f0c769e8dfc8debfe48f44f894d44
MD5 4acf293f5bad8921e97a5bde7b47f8bc
BLAKE2b-256 8557ee514839dfb5ab11cc62a2e3f0a3eeb9fcd7fb1a2e0896615a4b4d1ab75d

See more details on using hashes here.

File details

Details for the file cisv-0.2.4-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.2.4-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b7a4d8ec78ce5dfb289495f15dc34095a4c2e31d109e07ac069994e3204de42f
MD5 0898833bfccf0d9aee39dd875d59b0ac
BLAKE2b-256 ef995c000a430f98d8bd5324c05a0ee5e9bc6b7f2147cdc9de75d0af0a5b49c9

See more details on using hashes here.

File details

Details for the file cisv-0.2.4-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.2.4-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2fb3971898ac004f26650340a58e9e89afae0dc2106efd37c057cebd7cdd1caf
MD5 bee8b059c59a27409956994ebbc48b65
BLAKE2b-256 ae073ad723cd4e5f3bc85dde5b2681d34aebeb24d2549216a3bf95c8e2cdacdf

See more details on using hashes here.

File details

Details for the file cisv-0.2.4-cp313-cp313-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.2.4-cp313-cp313-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 409c7b893d4de4738c4940be173c2df4e41ee3e3697de82f258994b1321ed882
MD5 444ebcd2a5623164dea1a89e8197651c
BLAKE2b-256 da90c3bdeb7e44da3f5d2005e79cad4a25e3a87722829a631c7e3c654fdf9c00

See more details on using hashes here.

File details

Details for the file cisv-0.2.4-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.2.4-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ce3d60b765aeaa1cc11bc319ae8a3189388e0e2a12e1d779d035d9b35226f9f3
MD5 d5897da63464e64491bc5c410d68fde8
BLAKE2b-256 93ecdc10d72eaca0f6caafa1ee51bbcae766319c77b25a00029ed6151352b843

See more details on using hashes here.

File details

Details for the file cisv-0.2.4-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.2.4-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e907703cc5cd807c6fdf3c0e019a3e127385422274d74adc678a468ce1e5ec4a
MD5 0901c72ddfd1443f2384785b42bce74c
BLAKE2b-256 6e860aa58a3c6d1ed33f373efa3fdbae75775bbb49cbccf717bb4002fdce9423

See more details on using hashes here.

File details

Details for the file cisv-0.2.4-cp312-cp312-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.2.4-cp312-cp312-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 4c003c2da59401c462419d45c669f93a0e40c91973698cd612b68bd4a365c282
MD5 b5eba02c1aefc70437006d7c533f5e14
BLAKE2b-256 49bd6f3c93a901ca9c8b33c914736b47c35c77bd452c2986400b33f863f008d9

See more details on using hashes here.

File details

Details for the file cisv-0.2.4-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.2.4-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 202b279b70f1f71770aed58b9226b16fa0943229106f588d61cbac4d0ce7bdda
MD5 afacd6200e6061761df13a035db1e320
BLAKE2b-256 89b7a628202f607e8e7c1795f8f58bee66ec843d0614dc70a92c63b2a821558c

See more details on using hashes here.

File details

Details for the file cisv-0.2.4-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.2.4-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ac0e16d266b529e47bc0c9b48b08931d466a73cfbc14886b0a81f63221a47de0
MD5 db60dcaa5fddb91cd7445a3a9f82682b
BLAKE2b-256 5101bd26dddbf109a11df856f2ba8571cd320cabcb9fc71cb01fa3c8a83b4e91

See more details on using hashes here.

File details

Details for the file cisv-0.2.4-cp311-cp311-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.2.4-cp311-cp311-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 08403c77d3e1bd59af6870f8b56a5422fcc169bfc0c1993adb44cd4bb58c7a88
MD5 720b7ab6e3ed3bc28e27cd3cf94fff5d
BLAKE2b-256 80dc66bbba6de75e535b685f0f5e912aab00bbaa3427cc641b69471ea969f913

See more details on using hashes here.

File details

Details for the file cisv-0.2.4-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.2.4-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4486b5e1614a27c5a932accdf20fb68576342b5bd012e126a5598d86a9a9065e
MD5 5edf3567fee21d8df7c52d786f628afd
BLAKE2b-256 292defaaf68a742fd772cad593cd38b228d1e489ee71e96c4188e02220921f1c

See more details on using hashes here.

File details

Details for the file cisv-0.2.4-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.2.4-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1f35ff2c3df60bb51e6994884eb5872fb975055d21fe38cf416cb92b56be7da7
MD5 3e4734eb8da085348af7e5518ab56ed6
BLAKE2b-256 e3db8d8df7e82bf3207cc05d493869d8abb6815fd719cbf800d5c4dd7624a7ea

See more details on using hashes here.

File details

Details for the file cisv-0.2.4-cp310-cp310-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.2.4-cp310-cp310-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 24dfc2ff47f646552b6ba5b478c45aa9faee5d7e44a618888b598765292aba73
MD5 67ccd931d1ede32da60a3a13b0f3469f
BLAKE2b-256 d32662b56cacc967e8c0b7b88d657587492ab148d691fc60bdcae79e0c0092a9

See more details on using hashes here.

File details

Details for the file cisv-0.2.4-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.2.4-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 eb4e965f2f59a482b45d5a6674874707a79270ca35ecb5000219efc357c37c81
MD5 eec983e968db80ca851eb9233828397e
BLAKE2b-256 794a9554944a763742ac382b35efd4cce0bc0b615b0fc8b828b6303780b32115

See more details on using hashes here.

File details

Details for the file cisv-0.2.4-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

  • Download URL: cisv-0.2.4-cp39-cp39-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 69.5 kB
  • Tags: CPython 3.9, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cisv-0.2.4-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2726793a85e3f00ba3c56cc727ac53520da04d7e854d195f0ce3bf88b3412bae
MD5 45910ac1c31a4a4b51460ba56635b4c0
BLAKE2b-256 71849f88fb1d86952d96d5e92f9d7bf0d70d3b54ecd7bca8f2c4ab78b99a1d19

See more details on using hashes here.

File details

Details for the file cisv-0.2.4-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.2.4-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 532ac39159d4a55d9ac2d18a2471c0d9f2e189daceecade67bef55a18e4098e5
MD5 51c0dae62e6198847def750918b5b2cd
BLAKE2b-256 2b62c394ed9670f535a26f86c75b3803d9e58ce33732faedccf3410d32192c7d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page