High-performance CSV parser with SIMD optimizations (AVX-512/AVX2)
Project description
CISV Python Binding
High-performance CSV parser with SIMD optimizations for Python.
Requirements
- Python 3.7+
- CISV core library (
libcisv.so)
Installation
Build Core Library First
cd ../../core
make
Install Python Package
pip install -e .
Or using the Makefile:
make build
Quick Start
from cisv import CisvParser, parse_file, parse_string, count_rows
# Simple file parsing
rows = parse_file('data.csv')
for row in rows:
print(row)
# Parse with custom options
parser = CisvParser(
delimiter=';',
quote="'",
trim=True
)
rows = parser.parse_file('data.csv')
# Parse from string
csv_data = """name,age,email
John,30,john@example.com
Jane,25,jane@example.com"""
rows = parse_string(csv_data)
# Fast row counting (without full parsing)
total = count_rows('large.csv')
print(f"Total rows: {total}")
API Reference
CisvParser Class
class CisvParser:
def __init__(
self,
delimiter: str = ',',
quote: str = '"',
escape: Optional[str] = None,
comment: Optional[str] = None,
trim: bool = False,
skip_empty_lines: bool = False,
):
"""
Create a new CSV parser.
Args:
delimiter: Field separator character (default: ',')
quote: Quote character for fields (default: '"')
escape: Escape character (default: None for RFC4180 "" style)
comment: Comment line prefix (default: None)
trim: Strip whitespace from fields (default: False)
skip_empty_lines: Skip empty lines (default: False)
"""
def parse_file(self, path: str) -> List[List[str]]:
"""Parse a CSV file and return all rows."""
def parse_string(self, content: str) -> List[List[str]]:
"""Parse a CSV string and return all rows."""
Convenience Functions
def parse_file(
path: str,
delimiter: str = ',',
quote: str = '"',
**kwargs
) -> List[List[str]]:
"""Parse a CSV file with the given options."""
def parse_string(
content: str,
delimiter: str = ',',
quote: str = '"',
**kwargs
) -> List[List[str]]:
"""Parse a CSV string with the given options."""
def count_rows(path: str) -> int:
"""Count rows in a CSV file without full parsing."""
Configuration Options
| Option | Type | Default | Description |
|---|---|---|---|
delimiter |
str | ',' |
Field delimiter character |
quote |
str | '"' |
Quote character |
escape |
str | None |
Escape character |
comment |
str | None |
Comment line prefix |
trim |
bool | False |
Trim whitespace from fields |
skip_empty_lines |
bool | False |
Skip empty lines |
Examples
TSV Parsing
from cisv import CisvParser
parser = CisvParser(delimiter='\t')
rows = parser.parse_file('data.tsv')
Skip Comments and Empty Lines
parser = CisvParser(
comment='#',
skip_empty_lines=True,
trim=True
)
rows = parser.parse_file('config.csv')
Parse CSV String
from cisv import parse_string
data = """
id,name,value
1,foo,100
2,bar,200
"""
rows = parse_string(data, trim=True)
# [['id', 'name', 'value'], ['1', 'foo', '100'], ['2', 'bar', '200']]
Performance
CISV uses SIMD optimizations (AVX-512, AVX2, SSE2) for high-performance parsing. The Python binding uses ctypes to call directly into the native C library with minimal overhead.
Typical performance on modern hardware:
- 500MB+ CSV files parsed in under 1 second
- 10-50x faster than pure Python CSV parsers
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cisv-0.1.3.tar.gz.
File metadata
- Download URL: cisv-0.1.3.tar.gz
- Upload date:
- Size: 38.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15a8ac9f29e07766e69ec3f520d948dc8cbb1f46b8a0f6e75fbe1a5a59ef75f8
|
|
| MD5 |
dc84eecb10071a870ec7f0546f6e4c4e
|
|
| BLAKE2b-256 |
e2dd4c6009950e35c92aa067cb1078b670f218e62ef87669b91a4c43bcc68d02
|
File details
Details for the file cisv-0.1.3-py3-none-manylinux_2_35_x86_64.whl.
File metadata
- Download URL: cisv-0.1.3-py3-none-manylinux_2_35_x86_64.whl
- Upload date:
- Size: 38.9 kB
- Tags: Python 3, manylinux: glibc 2.35+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b31be84ba599ee97481806bab710e6b9fb47ad100fe9af3576426696f38d96ec
|
|
| MD5 |
dabb5223039620f31d7b95acf0c3b720
|
|
| BLAKE2b-256 |
d4d73c2feebcb4aa7cc414ad44e737e5767001cb83f378fff5497bb43e82b5a4
|
File details
Details for the file cisv-0.1.3-py3-none-macosx_11_0_arm64.whl.
File metadata
- Download URL: cisv-0.1.3-py3-none-macosx_11_0_arm64.whl
- Upload date:
- Size: 23.4 kB
- Tags: Python 3, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00d27851fed5563034dc36ffd843d9d7133096cc2df4a699c52cca13e157ff83
|
|
| MD5 |
f432139be2622658daee335e2fbb2b1a
|
|
| BLAKE2b-256 |
c4830161b7b7bb1395fbebde90dae2fc6d5c83d4894b30ef9684d926d107643c
|