Skip to main content

High-performance CSV parser with SIMD optimizations (AVX-512/AVX2)

Project description

CISV Python Binding

High-performance CSV parser with SIMD optimizations for Python.

Requirements

  • Python 3.7+
  • CISV core library (libcisv.so)

Installation

Build Core Library First

cd ../../core
make

Install Python Package

pip install -e .

Or using the Makefile:

make build

Quick Start

from cisv import CisvParser, parse_file, parse_string, count_rows

# Simple file parsing
rows = parse_file('data.csv')
for row in rows:
    print(row)

# Parse with custom options
parser = CisvParser(
    delimiter=';',
    quote="'",
    trim=True
)
rows = parser.parse_file('data.csv')

# Parse from string
csv_data = """name,age,email
John,30,john@example.com
Jane,25,jane@example.com"""

rows = parse_string(csv_data)

# Fast row counting (without full parsing)
total = count_rows('large.csv')
print(f"Total rows: {total}")

API Reference

CisvParser Class

class CisvParser:
    def __init__(
        self,
        delimiter: str = ',',
        quote: str = '"',
        escape: Optional[str] = None,
        comment: Optional[str] = None,
        trim: bool = False,
        skip_empty_lines: bool = False,
    ):
        """
        Create a new CSV parser.

        Args:
            delimiter: Field separator character (default: ',')
            quote: Quote character for fields (default: '"')
            escape: Escape character (default: None for RFC4180 "" style)
            comment: Comment line prefix (default: None)
            trim: Strip whitespace from fields (default: False)
            skip_empty_lines: Skip empty lines (default: False)
        """

    def parse_file(self, path: str) -> List[List[str]]:
        """Parse a CSV file and return all rows."""

    def parse_string(self, content: str) -> List[List[str]]:
        """Parse a CSV string and return all rows."""

Convenience Functions

def parse_file(
    path: str,
    delimiter: str = ',',
    quote: str = '"',
    **kwargs
) -> List[List[str]]:
    """Parse a CSV file with the given options."""

def parse_string(
    content: str,
    delimiter: str = ',',
    quote: str = '"',
    **kwargs
) -> List[List[str]]:
    """Parse a CSV string with the given options."""

def count_rows(path: str) -> int:
    """Count rows in a CSV file without full parsing."""

Configuration Options

Option Type Default Description
delimiter str ',' Field delimiter character
quote str '"' Quote character
escape str None Escape character
comment str None Comment line prefix
trim bool False Trim whitespace from fields
skip_empty_lines bool False Skip empty lines

Examples

TSV Parsing

from cisv import CisvParser

parser = CisvParser(delimiter='\t')
rows = parser.parse_file('data.tsv')

Skip Comments and Empty Lines

parser = CisvParser(
    comment='#',
    skip_empty_lines=True,
    trim=True
)
rows = parser.parse_file('config.csv')

Parse CSV String

from cisv import parse_string

data = """
id,name,value
1,foo,100
2,bar,200
"""

rows = parse_string(data, trim=True)
# [['id', 'name', 'value'], ['1', 'foo', '100'], ['2', 'bar', '200']]

Performance

CISV uses SIMD optimizations (AVX-512, AVX2, SSE2) for high-performance parsing. The Python binding uses ctypes to call directly into the native C library with minimal overhead.

Typical performance on modern hardware:

  • 500MB+ CSV files parsed in under 1 second
  • 10-50x faster than pure Python CSV parsers

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cisv-0.0.67-cp312-cp312-manylinux_2_17_x86_64.whl (34.1 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

cisv-0.0.67-cp312-cp312-macosx_11_0_arm64.whl (20.9 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cisv-0.0.67-cp311-cp311-manylinux_2_17_x86_64.whl (34.1 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

cisv-0.0.67-cp311-cp311-macosx_11_0_arm64.whl (20.9 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cisv-0.0.67-cp310-cp310-manylinux_2_17_x86_64.whl (34.1 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

cisv-0.0.67-cp310-cp310-macosx_11_0_arm64.whl (20.9 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file cisv-0.0.67-cp312-cp312-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.0.67-cp312-cp312-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 385764af01bf7225198d0527ae583edc6161816b21e88c0796734629bb99d398
MD5 0fa4da53171b371bc6d093e3e8c33649
BLAKE2b-256 e4bad474dce1f62b73445f4566ca92350732ba019db79df8c73cd82a2e28f9fa

See more details on using hashes here.

File details

Details for the file cisv-0.0.67-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.0.67-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8973c9edf785949d301f7abac87b5e054c8cd08d2823ad640ee0a0492fff15d2
MD5 c06445f2f750379cac63ab7ec6d2c1e0
BLAKE2b-256 d1cd85d1b2bc58759188bf5484ca014ba46fb1258dc6f378f471449864e03af2

See more details on using hashes here.

File details

Details for the file cisv-0.0.67-cp311-cp311-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.0.67-cp311-cp311-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 29a4624fba67fb34bdcd412c8bdab087ef27ab97a4429700fbb4c04286728312
MD5 6c0506a7e96783f0283e74a4d82bd183
BLAKE2b-256 877193120292c6b0160877329c9ac252b2f940185efe17daf433b0d1e1a863ab

See more details on using hashes here.

File details

Details for the file cisv-0.0.67-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.0.67-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 00bcc4e61afa9013f19f035782b816298bbeff84d6ad9b490eb88b2de3281d06
MD5 86cf9a53874d79a34aae505ed52394f8
BLAKE2b-256 8502c17f0d2c10e24678adb3342f3dfba9a31064a3345e3b930363aeb60e2759

See more details on using hashes here.

File details

Details for the file cisv-0.0.67-cp310-cp310-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for cisv-0.0.67-cp310-cp310-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 278975e1327a3f8511d0bc33b7fc01b131156057dcd099407c086f10d77ee3ab
MD5 466b898b234abb212187e77402667fd2
BLAKE2b-256 b7b96e610fc6bfcb8d6bc972d07f41012f9c2bbb2596c5eb42c8d20552a6ff9c

See more details on using hashes here.

File details

Details for the file cisv-0.0.67-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cisv-0.0.67-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 617f8d621ed01e4ffbbe0b29a55367f82845cbfc6be8b8d5e3f00ad51b56d9f8
MD5 408b952d0002fc527edfb2905e10c714
BLAKE2b-256 95bac4cc7b982b3acbdc0201dad6454ac80f84fb4b311512c31f0e9728ba7358

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page