Skip to main content

High-performance CSV parser with SIMD optimizations (AVX-512/AVX2)

Project description

CISV Python Binding

High-performance CSV parser with SIMD optimizations for Python.

Requirements

  • Python 3.7+
  • CISV core library (libcisv.so)

Installation

Build Core Library First

cd ../../core
make

Install Python Package

pip install -e .

Or using the Makefile:

make build

Quick Start

from cisv import CisvParser, parse_file, parse_string, count_rows

# Simple file parsing
rows = parse_file('data.csv')
for row in rows:
    print(row)

# Parse with custom options
parser = CisvParser(
    delimiter=';',
    quote="'",
    trim=True
)
rows = parser.parse_file('data.csv')

# Parse from string
csv_data = """name,age,email
John,30,john@example.com
Jane,25,jane@example.com"""

rows = parse_string(csv_data)

# Fast row counting (without full parsing)
total = count_rows('large.csv')
print(f"Total rows: {total}")

API Reference

CisvParser Class

class CisvParser:
    def __init__(
        self,
        delimiter: str = ',',
        quote: str = '"',
        escape: Optional[str] = None,
        comment: Optional[str] = None,
        trim: bool = False,
        skip_empty_lines: bool = False,
    ):
        """
        Create a new CSV parser.

        Args:
            delimiter: Field separator character (default: ',')
            quote: Quote character for fields (default: '"')
            escape: Escape character (default: None for RFC4180 "" style)
            comment: Comment line prefix (default: None)
            trim: Strip whitespace from fields (default: False)
            skip_empty_lines: Skip empty lines (default: False)
        """

    def parse_file(self, path: str) -> List[List[str]]:
        """Parse a CSV file and return all rows."""

    def parse_string(self, content: str) -> List[List[str]]:
        """Parse a CSV string and return all rows."""

Convenience Functions

def parse_file(
    path: str,
    delimiter: str = ',',
    quote: str = '"',
    **kwargs
) -> List[List[str]]:
    """Parse a CSV file with the given options."""

def parse_string(
    content: str,
    delimiter: str = ',',
    quote: str = '"',
    **kwargs
) -> List[List[str]]:
    """Parse a CSV string with the given options."""

def count_rows(path: str) -> int:
    """Count rows in a CSV file without full parsing."""

Configuration Options

Option Type Default Description
delimiter str ',' Field delimiter character
quote str '"' Quote character
escape str None Escape character
comment str None Comment line prefix
trim bool False Trim whitespace from fields
skip_empty_lines bool False Skip empty lines

Examples

TSV Parsing

from cisv import CisvParser

parser = CisvParser(delimiter='\t')
rows = parser.parse_file('data.tsv')

Skip Comments and Empty Lines

parser = CisvParser(
    comment='#',
    skip_empty_lines=True,
    trim=True
)
rows = parser.parse_file('config.csv')

Parse CSV String

from cisv import parse_string

data = """
id,name,value
1,foo,100
2,bar,200
"""

rows = parse_string(data, trim=True)
# [['id', 'name', 'value'], ['1', 'foo', '100'], ['2', 'bar', '200']]

Performance

CISV uses SIMD optimizations (AVX-512, AVX2, SSE2) for high-performance parsing. The Python binding uses ctypes to call directly into the native C library with minimal overhead.

Typical performance on modern hardware:

  • 500MB+ CSV files parsed in under 1 second
  • 10-50x faster than pure Python CSV parsers

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cisv-0.1.1.tar.gz (38.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cisv-0.1.1-py3-none-manylinux_2_35_x86_64.whl (38.9 kB view details)

Uploaded Python 3manylinux: glibc 2.35+ x86-64

cisv-0.1.1-py3-none-macosx_11_0_arm64.whl (23.4 kB view details)

Uploaded Python 3macOS 11.0+ ARM64

File details

Details for the file cisv-0.1.1.tar.gz.

File metadata

  • Download URL: cisv-0.1.1.tar.gz
  • Upload date:
  • Size: 38.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cisv-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0c5323fb4706fea59eec27afb7b443908c61dca3e54151b407cb2c0abf10acc1
MD5 b16b9928d3226fdca893aa3c622a1fdf
BLAKE2b-256 1ee78891387f164c73bd70afe04c8eeb309cc313fd0201e7cd4e9a28aab335d7

See more details on using hashes here.

File details

Details for the file cisv-0.1.1-py3-none-manylinux_2_35_x86_64.whl.

File metadata

  • Download URL: cisv-0.1.1-py3-none-manylinux_2_35_x86_64.whl
  • Upload date:
  • Size: 38.9 kB
  • Tags: Python 3, manylinux: glibc 2.35+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cisv-0.1.1-py3-none-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 7cd1e5b0b3705b75b151b66d30da960bed2b1beb38feb06f1bb9e2c9d0166fbe
MD5 9247d4dfc4bacf7c08383978cb10d204
BLAKE2b-256 5590aeae4445b07f7e09c43219f83fda857c3dd8cce9a286d066b44d1e5d3c3a

See more details on using hashes here.

File details

Details for the file cisv-0.1.1-py3-none-macosx_11_0_arm64.whl.

File metadata

  • Download URL: cisv-0.1.1-py3-none-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 23.4 kB
  • Tags: Python 3, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cisv-0.1.1-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7c0a83eac02de55d809a7caa80ca86c98666fbf2d3076004399ee380819470f5
MD5 b386d6ba8d8c1925e9c0fef297da36e2
BLAKE2b-256 47d72b086c0875cab019f93db5ede20a7955fb81977a2143272b47517e61726f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page