Skip to main content

High-performance CSV parser with SIMD optimizations

Project description

CISV Python Bindings (nanobind)

High-performance Python bindings for the CISV CSV parser using nanobind.

Performance

These bindings are 10-100x faster than the ctypes-based bindings because they:

  1. Use the batch API: All data is parsed in C and returned at once, eliminating millions of per-field callbacks
  2. Use nanobind: Much lower overhead than ctypes or pybind11
  3. Release the GIL: Parallel parsing runs without holding the Python GIL
File Size ctypes nanobind Speedup
142MB (1M rows × 10 cols) ~20s <0.8s 25x+

Installation

From PyPI (recommended)

pip install cisv

From source

cd bindings/python-nanobind
pip install .

Development install

cd bindings/python-nanobind
pip install -e .

Usage

import cisv

# Parse a file
rows = cisv.parse_file('data.csv')

# Parse with options
rows = cisv.parse_file(
    'data.csv',
    delimiter=';',
    quote="'",
    trim=True,
    skip_empty_lines=True
)

# Parse large files in parallel (faster on multi-core systems)
rows = cisv.parse_file('large.csv', parallel=True)

# Parse a string
rows = cisv.parse_string("a,b,c\n1,2,3")

# Count rows quickly (SIMD-accelerated)
count = cisv.count_rows('data.csv')

# Row-by-row iteration (memory efficient, supports early exit)
with cisv.CisvIterator('large.csv') as reader:
    for row in reader:
        print(row)  # List[str]
        if row[0] == 'stop':
            break  # Early exit - no wasted work

# Or use the convenience function
for row in cisv.open_iterator('data.csv', delimiter=',', trim=True):
    process(row)

API Reference

parse_file(path, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False, parallel=False, num_threads=0)

Parse a CSV file and return all rows.

Parameters:

  • path: Path to the CSV file
  • delimiter: Field delimiter character (default: ',')
  • quote: Quote character (default: '"')
  • trim: Whether to trim whitespace from fields
  • skip_empty_lines: Whether to skip empty lines
  • parallel: Use multi-threaded parsing (faster for large files)
  • num_threads: Number of threads for parallel parsing (0 = auto-detect)

Returns: List of rows, where each row is a list of field values.

parse_string(content, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False)

Parse a CSV string and return all rows.

Parameters:

  • content: CSV content as a string
  • delimiter: Field delimiter character (default: ',')
  • quote: Quote character (default: '"')
  • trim: Whether to trim whitespace from fields
  • skip_empty_lines: Whether to skip empty lines

Returns: List of rows, where each row is a list of field values.

count_rows(path)

Count the number of rows in a CSV file without full parsing.

This is very fast as it only scans for newlines using SIMD instructions.

Parameters:

  • path: Path to the CSV file

Returns: Number of rows in the file.

CisvIterator(path, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False)

Row-by-row iterator for streaming CSV parsing with minimal memory footprint.

Provides fgetcsv-style iteration that supports early exit - breaking out of iteration stops parsing immediately with no wasted work.

Parameters:

  • path: Path to the CSV file
  • delimiter: Field delimiter character (default: ',')
  • quote: Quote character (default: '"')
  • trim: Whether to trim whitespace from fields
  • skip_empty_lines: Whether to skip empty lines

Methods:

  • next(): Get the next row as List[str], or None if at end of file
  • close(): Close the iterator and release resources
  • closed: Property indicating whether the iterator has been closed

Protocols:

  • Iterator protocol: Use in for loops with for row in iterator
  • Context manager: Use with with statement for automatic cleanup

Example:

# Context manager (recommended)
with cisv.CisvIterator('data.csv') as reader:
    for row in reader:
        if row[0] == 'target':
            print(f"Found: {row}")
            break  # Early exit

# Manual iteration
reader = cisv.CisvIterator('data.csv')
try:
    while True:
        row = reader.next()
        if row is None:
            break
        process(row)
finally:
    reader.close()

open_iterator(path, delimiter=',', quote='"', *, trim=False, skip_empty_lines=False)

Convenience function that returns a CisvIterator. Same parameters as CisvIterator.

Example:

for row in cisv.open_iterator('data.csv'):
    print(row)

Running Tests

cd bindings/python-nanobind
pip install -e ".[test]"
pytest

Benchmarking

pip install -e ".[benchmark]"
python -c "
import cisv
import time

# Create test file
with open('/tmp/test.csv', 'w') as f:
    f.write('col1,col2,col3\n')
    for i in range(100000):
        f.write(f'value{i}_1,value{i}_2,value{i}_3\n')

# Benchmark
start = time.time()
rows = cisv.parse_file('/tmp/test.csv')
print(f'Parsed {len(rows)} rows in {time.time()-start:.3f}s')
"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cisv-0.4.4.tar.gz (48.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cisv-0.4.4-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (108.0 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.4.4-cp313-cp313-macosx_11_0_x86_64.whl (74.4 kB view details)

Uploaded CPython 3.13macOS 11.0+ x86-64

cisv-0.4.4-cp313-cp313-macosx_11_0_arm64.whl (70.7 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cisv-0.4.4-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (108.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.4.4-cp312-cp312-macosx_11_0_x86_64.whl (74.4 kB view details)

Uploaded CPython 3.12macOS 11.0+ x86-64

cisv-0.4.4-cp312-cp312-macosx_11_0_arm64.whl (70.7 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cisv-0.4.4-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (108.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.4.4-cp311-cp311-macosx_11_0_x86_64.whl (74.8 kB view details)

Uploaded CPython 3.11macOS 11.0+ x86-64

cisv-0.4.4-cp311-cp311-macosx_11_0_arm64.whl (71.3 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cisv-0.4.4-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (108.7 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.4.4-cp310-cp310-macosx_11_0_x86_64.whl (75.0 kB view details)

Uploaded CPython 3.10macOS 11.0+ x86-64

cisv-0.4.4-cp310-cp310-macosx_11_0_arm64.whl (71.5 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

cisv-0.4.4-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (109.0 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

cisv-0.4.4-cp39-cp39-macosx_11_0_x86_64.whl (75.2 kB view details)

Uploaded CPython 3.9macOS 11.0+ x86-64

cisv-0.4.4-cp39-cp39-macosx_11_0_arm64.whl (71.7 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file cisv-0.4.4.tar.gz.

File metadata

  • Download URL: cisv-0.4.4.tar.gz
  • Upload date:
  • Size: 48.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cisv-0.4.4.tar.gz
Algorithm Hash digest
SHA256 c0ecf1923971d9a0d231a7a9a17250236c291908b6bf7d9a867b358f54e88f34
MD5 16816a8266093d6b216bdd97103a8a1c
BLAKE2b-256 d150fc9803c7a8a9093983a84d2d34f836519714f09f87f9488e4b651daad27b

See more details on using hashes here.

File details

Details for the file cisv-0.4.4-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.4-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 5377995b89fdc9f41f8cff924c9c25ef92cc239fe3c75f8b3fa4d9582eb7042b
MD5 808cda517c7ce8b542040fc28acdb4bb
BLAKE2b-256 bd8a2c7fdebe5503c62261a50ba019f8cf30860ece94da600b2256d7cd94e49a

See more details on using hashes here.

File details

Details for the file cisv-0.4.4-cp313-cp313-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.4-cp313-cp313-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 5b59d0f03f8b0bd4b3d9525e57e24ae57b98eba5ca41011af6a49a055e9c6024
MD5 7e0a42aa78ddb30ed03223a8c6eff374
BLAKE2b-256 b287bebd05f18ceb26986220b610c739a5f51682a865fce840779f2b752f9aaf

See more details on using hashes here.

File details

Details for the file cisv-0.4.4-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.4.4-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f25be9529315e475599a94d666715a741435ebb611bfbebdf3f1326f7f6a2e01
MD5 5e913eaeea2fce51d06315b9ab9dbc99
BLAKE2b-256 85c4297c2fbc7a7254ec5112dedb807846381595c8cf84d128832e4a70042078

See more details on using hashes here.

File details

Details for the file cisv-0.4.4-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.4-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b24b72f3a681645432f6803d2a7df259fc5204887ec0aeb377a51f2b9c5ee788
MD5 cd68a9a7ffd6479fbcbbdbda7c3834b0
BLAKE2b-256 8087200617218c9404f8cd308efa1b86798321c6b01490e5e3e02c8c2bba99cd

See more details on using hashes here.

File details

Details for the file cisv-0.4.4-cp312-cp312-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.4-cp312-cp312-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 f38efe8656633b66b0cb08a237b8055e6cc8fd05998966fbfd7f0bd3ee754488
MD5 f5c2f54fd89c412d9fbd9b3a2d7b32fc
BLAKE2b-256 97e67e622b9c1809c7ce7f72dd53d5fd933c6174593b312e5cb2ab8c3379d391

See more details on using hashes here.

File details

Details for the file cisv-0.4.4-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.4.4-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c7b6f22f815e3225e2bede142415b0bebb65d82ba8456c57e6f38bcffe982cd2
MD5 d64ad28762145ce1ee0b06114f0bc5d3
BLAKE2b-256 5d7d079b8da8d019c33817985fed2fbd5d8538ddfe04c063dc53ca90dcb99e24

See more details on using hashes here.

File details

Details for the file cisv-0.4.4-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.4-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 18bb2027a2f504a413e194d1f3cfb1d109e197d0e84a0a1e36be0d946703167d
MD5 5f4d54594a46baf471ecd3f043db9d7f
BLAKE2b-256 d743916e5bf813e5d43e4f80ffa2801810cc5fcfe01e3953ece02984c5598d73

See more details on using hashes here.

File details

Details for the file cisv-0.4.4-cp311-cp311-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.4-cp311-cp311-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 94af269b494108fad56cbc6dd352b7d227d5b00614774559c305743b9b44ad33
MD5 822a03e127b846023aefc20a7528b657
BLAKE2b-256 60c67001dbc10f5eba1981d6e9f94bc521bd5484724cb296c6d4aae493d0f2da

See more details on using hashes here.

File details

Details for the file cisv-0.4.4-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.4.4-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 dc14f53baf9bd962318de3fa54bad8a67e54f6c9bae35b5a5256c7f30a5dfb8a
MD5 93474928a7a7fd430c8bfbf1c2fe98ef
BLAKE2b-256 36f6eaf8068b66f0edaaa9298e2648aaa15f34ecb652f721e126d025f1e45732

See more details on using hashes here.

File details

Details for the file cisv-0.4.4-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.4-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4dd746f7c37d2ebce5f60a32d5f7d19494143763e9f7a43f0ad3a73d7a071db2
MD5 2b1e6da8d3568202508762368381b8df
BLAKE2b-256 a2e7ce6b3453af7ea17bea2cf277138b32f797d93ff6069808cfd49d1aa4c2c6

See more details on using hashes here.

File details

Details for the file cisv-0.4.4-cp310-cp310-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.4-cp310-cp310-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 8c5d716b20d6af4ae9c5a1b843f49828493b69c0e948e9e9f65aaf3393e7f16b
MD5 940944c8884a8c50dec55ee964ada4bf
BLAKE2b-256 77c0d056cc297e378633a40e868046afd7ad147ebd5993947e4d5df5de5d06b2

See more details on using hashes here.

File details

Details for the file cisv-0.4.4-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.4.4-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e81c85ad5ed7e7fcb98bf1a8a09b4d99369f2e3a7c329815a5746a4767997ec8
MD5 2bdef15bce5e5fe6ce33d50eddb5d8b3
BLAKE2b-256 0d98d44fc186cb9e251760cd0ac3a9e5f01bf17b6de3c5b9f888a381fb1df1c5

See more details on using hashes here.

File details

Details for the file cisv-0.4.4-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.4.4-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b6304086d16e736400dcf05b931dbdd5346f96da8e84ed101fd6fb08c4a9f896
MD5 7ac623a9daf577b46fa8fa082e8059a5
BLAKE2b-256 3a0b14f4b2564371eaa889d876ec98e6daf25947a6d32b442232c2d65f33ac72

See more details on using hashes here.

File details

Details for the file cisv-0.4.4-cp39-cp39-macosx_11_0_x86_64.whl.

File metadata

  • Download URL: cisv-0.4.4-cp39-cp39-macosx_11_0_x86_64.whl
  • Upload date:
  • Size: 75.2 kB
  • Tags: CPython 3.9, macOS 11.0+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cisv-0.4.4-cp39-cp39-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 226cd03f7bdda2f3988013909788d560d094704c7738401880aa04dbeece2a2a
MD5 44ff2705af9c0d87085674a846ac4551
BLAKE2b-256 2e0eab215eb24c4c9da276941ecf0cfe79369f055d52afe4addd5425d2915cea

See more details on using hashes here.

File details

Details for the file cisv-0.4.4-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

  • Download URL: cisv-0.4.4-cp39-cp39-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 71.7 kB
  • Tags: CPython 3.9, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cisv-0.4.4-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 acf929cebe4025838f954c2b93ea7fa6f233dea38e9332c47de077c0cc8ec4d7
MD5 4caf39d6f9356e3f9bd8da8d0670d0ec
BLAKE2b-256 42951cfed593dd207a865643aa60b601a7e7f00b842ef8d7320b3fc556db426f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page