Fast CSV processing and data cleaning companion for pandas

Project description

arnio

Fast CSV loading and cleaning for Python, powered by C++.

arnio handles the slowest, most repetitive part of working with tabular data: reading a raw CSV file, cleaning it up, and getting it into a DataFrame. The parsing and cleaning run in C++ through pybind11. The output is a standard pandas DataFrame.

arnio demo

pip install arnio

import arnio as ar

# Load and clean in three lines
frame = ar.read_csv("customers.csv")

clean = ar.pipeline(frame, [
    ("strip_whitespace",),
    ("drop_nulls",),
    ("drop_duplicates",),
])

df = ar.to_pandas(clean)

Requires Python 3.9+. Wheels available for Linux, macOS, and Windows. Source builds require a C++17 compiler.

How arnio is different

CSV parsing runs in C++, not Python. On large files, ar.read_csv() uses measurably less time and memory than pd.read_csv.
Cleaning is built in, not bolted on. ar.pipeline() takes a list of named steps and runs them in sequence. No scattered method chains, no copy-paste between notebooks.
Preview before you load. ar.scan_csv("file.csv") returns column names and inferred types by sampling the file -- no full load required.
Exact memory tracking. frame.memory_usage() returns real byte counts from C++. No estimation, no deep=True.
Pandas is the output, not the engine. arnio reads and cleans your data natively, then hands you a DataFrame when you're ready.

Performance

Benchmark: 1M-row CSV, 12 columns, mixed types.

Tool	Load time	Peak memory
pandas	~4.2s	~620 MB
arnio	~2.1s	~380 MB

Approximately 2x faster CSV ingestion and 40% lower peak memory on large files.

Measured on an M2 MacBook Pro, Python 3.11. Your results will vary. Benchmark with your own data.

pandas vs arnio

pandas

import pandas as pd

df = pd.read_csv("sales.csv")

str_cols = df.select_dtypes(include="object").columns
df[str_cols] = df[str_cols].apply(lambda c: c.str.strip())

df = df.dropna()
df = df.drop_duplicates()

arnio

import arnio as ar

frame = ar.read_csv("sales.csv")

clean = ar.pipeline(frame, [
    ("strip_whitespace",),
    ("drop_nulls",),
    ("drop_duplicates",),
])

df = ar.to_pandas(clean)

Same result. Less code. Each step is explicit. The pipeline runs in C++.

When to use arnio

Use arnio when your bottleneck is loading and cleaning CSVs -- large files, messy columns, repeated preprocessing across projects.

Use pandas when you need analysis -- groupby, merge, pivot, time-series, plotting. arnio produces DataFrames; everything downstream stays the same.

arnio replaces the first steps of your notebook. It does that part faster and with less code. Everything after that is still pandas.

Roadmap

arnio is actively in development. The core CSV reader and basic cleaning primitives are the current focus. Planned work includes:

C++ CSV parser core
Basic cleaning API (drop_nulls, strip_whitespace, normalize_columns)
pandas DataFrame output
Streaming / chunked reads for very large files
Type inference and automatic dtype casting
Encoding detection and normalization
Schema validation and column contracts
Parallel parsing across CPU cores
CLI tool (arnio clean data.csv --output clean.csv)
Async-friendly API for use in async pipelines

Feedback on priorities is welcome — open a GitHub Discussion to share what matters most to you.

Contributing

Contributions are welcome and genuinely appreciated. arnio is early-stage, which means there's real space to shape how it grows.

To get started:

git clone https://github.com/yourusername/arnio.git
cd arnio
pip install -e ".[dev]"

Before submitting a pull request:

Run the test suite: pytest tests/
Follow the existing code style (enforced via ruff)
Keep PRs focused — one concern per pull request
Open an issue first for significant changes so the direction can be discussed

There's a CONTRIBUTING.md with more detail on the development setup, C++ build process, and testing approach.

License

arnio is released under the MIT License.

Built to make Python data work feel faster and cleaner — one CSV at a time.

Project details

Release history Release notifications | RSS feed

0.1.2

May 3, 2026

This version

0.1.1

May 3, 2026

0.1.0

May 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arnio-0.1.1.tar.gz (11.1 MB view details)

Uploaded May 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

arnio-0.1.1-cp313-cp313-win_amd64.whl (434.7 kB view details)

Uploaded May 3, 2026 CPython 3.13Windows x86-64

File details

Details for the file arnio-0.1.1.tar.gz.

File metadata

Download URL: arnio-0.1.1.tar.gz
Upload date: May 3, 2026
Size: 11.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for arnio-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`6928855138fa344ffe72450612a7653cdfa4a4c9ea19de234e740d7b2cceadb1`
MD5	`29043dd21eb649a18b66a358dc2d9b55`
BLAKE2b-256	`5cde11849d1a401e5de628cf50a0da1fb361e4e96ee811d2150620540c4a385f`

See more details on using hashes here.

File details

Details for the file arnio-0.1.1-cp313-cp313-win_amd64.whl.

File metadata

Download URL: arnio-0.1.1-cp313-cp313-win_amd64.whl
Upload date: May 3, 2026
Size: 434.7 kB
Tags: CPython 3.13, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for arnio-0.1.1-cp313-cp313-win_amd64.whl
Algorithm	Hash digest
SHA256	`0b0a30aea8763d5fbfe7ef902c3ef6b7c6f0cf467c6be0aaa411f37d1e0d996f`
MD5	`722c21589022c6a36d77bc29ac90e0cf`
BLAKE2b-256	`e059b7186db8e862acc9cb31ee293d9de6215ad6d9f577ebfe52308f1a1ddab6`

See more details on using hashes here.

arnio 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

arnio

How arnio is different

Performance

pandas vs arnio

When to use arnio

Roadmap

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes