Skip to main content

Blazing-fast drop-in replacement for Python's csv module, powered by Rust

Project description

zcsv

Blazing-fast drop-in replacement for Python's csv module, powered by Rust.

import zcsv as csv — same API, 4-6x faster.

License: MIT Python 3.9+ Rust


zcsv benchmarks

Why zcsv?

Python's built-in csv module is implemented in C but still creates Python objects for every field of every row. For a 100K-row file with 50 columns, that's 5 million string allocations just to iterate.

zcsv eliminates this. The Rust core parses CSV via SIMD instructions into a single contiguous buffer. The Python-facing Row object is a zero-copy cursor — it holds a pointer and an index, nothing more. Python strings are created only when you access a field, and only for the fields you actually use.

Features you won't find elsewhere

  • CSV Injection Protectionsafe=True escapes =, +, -, @, \t, \r (OWASP best practice)
  • RFC 4180 Strict Modestrict=True validates field counts, quoting rules, with line numbers in errors
  • Delimiter Autodetection — frequency analysis in Rust (, ; \t | :)
  • Encoding Autodetection — UTF-8, UTF-16 (LE/BE), Latin-1, BOM handling
  • Automatic Type Inferencezcsv.read() returns typed list[dict] (int, float, bool, str)

Install

pip install zcsv

From source (requires Rust toolchain):

git clone https://github.com/Seinarukiro2/zcsv.git
cd zcsv
pip install maturin
maturin develop --release

Quick Start

Drop-in replacement

import zcsv as csv

# Exactly like stdlib — but 4x faster
with open("data.csv") as f:
    for row in csv.reader(f):
        print(row[0], row[1])

# DictReader — 6x faster
with open("data.csv") as f:
    for row in csv.DictReader(f):
        print(row["name"], row["age"])

# Writer — 1.4x faster
with open("out.csv", "w", newline="") as f:
    w = csv.writer(f)
    w.writerow(["name", "age"])
    w.writerows([["Alice", "30"], ["Bob", "25"]])

zcsv extensions

import zcsv

# Read with automatic type inference
data = zcsv.read("data.csv")
# [{"name": "Alice", "age": 30, "active": True}, ...]

# Batch reading for large files
for batch in zcsv.read_batches("huge.csv", batch_size=10_000):
    process(batch)

# Write with CSV injection protection (safe=True by default)
zcsv.write("out.csv", data)

API Reference

zcsv.reader(csvfile, **kwargs)

Returns a cursor iterator. Each iteration advances to the next row. Access fields with row[0] (by index) or row.to_list() for a full list.

with open("data.csv") as f:
    for row in zcsv.reader(f):
        name = row[0]          # lazy — creates Python string only now
        last = row[-1]         # negative indexing works
        print(len(row))        # field count
        print(repr(row))       # ['Alice', '30', 'NYC']

Storing rows: The cursor reuses the same object. To collect rows, use snapshot():

with open("data.csv") as f:
    rows = [row.snapshot() for row in zcsv.reader(f)]
    # or: [row.to_list() for row in zcsv.reader(f)]

Parameters: delimiter, quotechar, strict (RFC 4180 validation)

zcsv.DictReader(f, fieldnames=None, **kwargs)

Same cursor pattern with dict-like access:

with open("data.csv") as f:
    for row in zcsv.DictReader(f):
        row["name"]           # by key
        row[0]                # also by index
        row.keys()            # column names
        row.values()          # all values
        row.items()           # (key, value) pairs
        row.get("x", "N/A")  # with default
        "name" in row         # membership test

zcsv.read(path, **kwargs) -> list[dict]

Read entire file with automatic type inference.

zcsv.read("data.csv",
    delimiter=None,       # None = autodetect
    has_header=True,
    schema={"id": int, "price": float},  # override types
    skip_rows=0,
    max_rows=None,
    columns=["name", "age"],  # select columns
    null_values=["", "NA", "null", "None"],
    encoding=None,        # None = autodetect
    strict=False,         # RFC 4180 validation
    n_threads=None,       # parallel type conversion
)

zcsv.write(path, data, **kwargs)

zcsv.write("out.csv", data,
    delimiter=",",
    safe=True,    # CSV injection protection (default: True)
    strict=False,
)

zcsv.writer(csvfile, **kwargs) / zcsv.DictWriter(csvfile, fieldnames, **kwargs)

Stdlib-compatible streaming writer. safe=False by default (stdlib compat).

zcsv.read_batches(path, batch_size=10_000, **kwargs)

Memory-efficient iterator yielding list[dict] batches.


Architecture

Python API        ┌─ reader() ─── cursor Row (zero-copy, lazy strings)
                  ├─ DictReader() ─── cursor with field names
                  ├─ writer() / DictWriter() ─── raw FFI serialization
                  ├─ read() ─── type inference + parallel conversion
                  └─ write() ─── CSV injection protection

Rust Core         ┌─ simd-csv ─── SIMD-accelerated CSV parsing
(PyO3 + FFI)      ├─ SharedData ─── single Vec<u8> buffer for all rows
                  ├─ memmap2 ─── memory-mapped I/O for large files
                  ├─ rayon ─── parallel column type conversion
                  ├─ encoding_rs ─── charset detection + conversion
                  └─ fast_pyobjects ─── raw CPython FFI (PyUnicode_New, PyList_SET_ITEM)

Key design decisions

  • Zero-copy Row: All CSV data lives in one contiguous Vec<u8>. Row is Arc<SharedData> + u32 — 12 bytes, no per-row heap allocation. Python strings created only on field access via raw PyUnicode_New.
  • Cursor pattern: reader.__next__() returns self with Py_INCREF (~10ns) instead of allocating a new object (~900ns).
  • String dedup cache: Repeated values (countries, categories, booleans) are cached. Auto-disables after 200 samples if hit rate < 20%.
  • GIL release: File I/O, SIMD parsing, type inference, CSV serialization all run with GIL released.

Benchmarks

100,000 rows, Python 3.13, Apple Silicon M4:

Operation stdlib csv zcsv Speedup
reader() 10 cols 0.080s 0.018s 4.4x
reader() 50 cols 0.335s 0.073s 4.6x
DictReader() 10 cols 0.124s 0.025s 5.0x
DictReader() 50 cols 0.491s 0.082s 6.0x
writer() 10 cols 0.160s 0.112s 1.4x
writer() 50 cols 0.780s 0.548s 1.4x

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zcsv-0.2.0.tar.gz (34.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zcsv-0.2.0-cp313-cp313-macosx_11_0_arm64.whl (468.5 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

File details

Details for the file zcsv-0.2.0.tar.gz.

File metadata

  • Download URL: zcsv-0.2.0.tar.gz
  • Upload date:
  • Size: 34.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for zcsv-0.2.0.tar.gz
Algorithm Hash digest
SHA256 4992834dcd9627f3c6fc17dacc122eac99edeb175e1fcfa07b21b90ef949d68d
MD5 93d0543c4b09e85b6e6e96010fa43fc2
BLAKE2b-256 3b263f801b2646dc5b43ea4cf7b0c34995e09bdc74a1b5925853414fffd692df

See more details on using hashes here.

File details

Details for the file zcsv-0.2.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for zcsv-0.2.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9ebee8c5e53820354b805fac7c5faa192e601d4c0e6964b3d16d5db7df0a9cdb
MD5 49086133acd0ff781e221cb15a218d7a
BLAKE2b-256 d261b672f0fdf10a310c0517aa0fab410ac1dcd80d215fb4a2c541729cf90bbd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page