Fast CSV processing and data cleaning companion for pandas

Project description

arnio

Fast CSV processing and data cleaning for Python — powered by C++.

arnio is a lightweight, C++-backed Python library designed to make CSV ingestion and data cleaning faster, simpler, and more memory-efficient. It sits comfortably alongside your existing pandas workflows, handling the heavy lifting before your data ever hits a DataFrame.

If you've ever waited too long for a large CSV to load, or spent more time wrangling messy columns than actually analyzing data — arnio is built for you.

import arnio as ar

df = ar.read_csv("sales_data.csv")
clean = ar.clean(df, drop_nulls=True, strip_whitespace=True)

Why arnio?

Python's data ecosystem is excellent — but reading and cleaning raw CSV files at scale has always been a rough edge. pandas.read_csv is versatile and reliable, but it isn't optimized for raw speed or low memory pressure. Preprocessing logic ends up scattered across notebooks, inconsistent between projects, and slow on large files.

arnio addresses this directly:

Reads large CSVs faster using a C++ parsing core with minimal Python overhead
Provides a clean, composable cleaning API so preprocessing logic is explicit, readable, and repeatable
Works with pandas — arnio outputs standard DataFrames; it's a faster on-ramp, not a replacement
Keeps memory usage low by parsing efficiently before data is materialized in Python

Key Features

Feature	Description
⚡ C++ CSV parser	Reads large files with significantly lower overhead than pure-Python parsers
🧹 Cleaning primitives	Drop nulls, strip whitespace, fix dtypes, normalize column names, and more
🐼 pandas-compatible output	Returns standard `pd.DataFrame` objects — drop arnio into any existing workflow
🧠 Memory-conscious	Designed to avoid unnecessary copies during ingestion and cleaning
🔗 Composable API	Chain cleaning operations in a clear, readable style
📦 Simple import	`import arnio as ar` — that's it

Installation

pip install arnio

Note: arnio requires Python 3.9+ and a platform with a compatible C++ build (wheels are provided for Linux, macOS, and Windows on PyPI). For source builds, a C++17-compatible compiler is required.

Quick Start

Reading a CSV

import arnio as ar

# Fast CSV ingestion — returns a pandas DataFrame
df = ar.read_csv("data/transactions.csv")

Cleaning Data

import arnio as ar

df = ar.read_csv("data/customers.csv")

clean = ar.clean(
    df,
    drop_nulls=True,          # Remove rows with missing values
    strip_whitespace=True,    # Trim leading/trailing whitespace in string columns
    normalize_columns=True,   # Lowercase, underscore-separated column names
)

Chaining Operations

import arnio as ar

result = (
    ar.read_csv("data/orders.csv")
      .pipe(ar.drop_nulls)
      .pipe(ar.normalize_columns)
      .pipe(ar.strip_whitespace)
)

arnio operations return standard DataFrames, so you can pass results directly into any pandas, scikit-learn, or plotting workflow without modification.

arnio + pandas: Better Together

arnio is not a pandas replacement. It is the step that happens before pandas — and occasionally alongside it.

Think of it this way:

Raw CSV on disk
      │
      ▼
  ar.read_csv()        ← Fast C++ parsing, low memory
      │
      ▼
  ar.clean()           ← Consistent preprocessing
      │
      ▼
  pd.DataFrame         ← Your normal pandas workflow continues here
      │
      ▼
  Analysis, modeling, visualization — business as usual

Once your data is clean and in a DataFrame, pandas takes over completely. arnio simply makes the path from raw file to clean DataFrame faster and more predictable.

Philosophy: arnio doesn't compete with the pandas ecosystem — it strengthens it. The goal is to reduce the friction between raw data and productive analysis.

Roadmap

arnio is actively in development. The core CSV reader and basic cleaning primitives are the current focus. Planned work includes:

C++ CSV parser core
Basic cleaning API (drop_nulls, strip_whitespace, normalize_columns)
pandas DataFrame output
Streaming / chunked reads for very large files
Type inference and automatic dtype casting
Encoding detection and normalization
Schema validation and column contracts
Parallel parsing across CPU cores
CLI tool (arnio clean data.csv --output clean.csv)
Async-friendly API for use in async pipelines

Feedback on priorities is welcome — open a GitHub Discussion to share what matters most to you.

Contributing

Contributions are welcome and genuinely appreciated. arnio is early-stage, which means there's real space to shape how it grows.

To get started:

git clone https://github.com/yourusername/arnio.git
cd arnio
pip install -e ".[dev]"

Before submitting a pull request:

Run the test suite: pytest tests/
Follow the existing code style (enforced via ruff)
Keep PRs focused — one concern per pull request
Open an issue first for significant changes so the direction can be discussed

There's a CONTRIBUTING.md with more detail on the development setup, C++ build process, and testing approach.

License

arnio is released under the MIT License.

Built to make Python data work feel faster and cleaner — one CSV at a time.

Project details

Release history Release notifications | RSS feed

0.1.2

May 3, 2026

0.1.1

May 3, 2026

This version

0.1.0

May 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arnio-0.1.0.tar.gz (18.6 kB view details)

Uploaded May 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

arnio-0.1.0-cp313-cp313-win_amd64.whl (426.4 kB view details)

Uploaded May 3, 2026 CPython 3.13Windows x86-64

File details

Details for the file arnio-0.1.0.tar.gz.

File metadata

Download URL: arnio-0.1.0.tar.gz
Upload date: May 3, 2026
Size: 18.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for arnio-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`769f5c9f39b23f541f4c0fffe5cc74546b7a00eb3409e9e316b9c33d792f5290`
MD5	`0b0e1462899af058abd296246ff0b655`
BLAKE2b-256	`47ef12457ac600cfbee1a799cde44360ff4c5384db5451c8e625ba5c5a5bc795`

See more details on using hashes here.

File details

Details for the file arnio-0.1.0-cp313-cp313-win_amd64.whl.

File metadata

Download URL: arnio-0.1.0-cp313-cp313-win_amd64.whl
Upload date: May 3, 2026
Size: 426.4 kB
Tags: CPython 3.13, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for arnio-0.1.0-cp313-cp313-win_amd64.whl
Algorithm	Hash digest
SHA256	`429f3266c3d313a38d42cb2cc5ddbc5df2dead6ca0a546a06d444e45f6bd5af4`
MD5	`252104f73baf323cffc49bef09f55662`
BLAKE2b-256	`f809c9c76efa039e1e42d89c4d1afe3f8705bec46b33f359c5e5242bdf60edde`

See more details on using hashes here.

arnio 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

arnio

Why arnio?

Key Features

Installation

Quick Start

Reading a CSV

Cleaning Data

Chaining Operations

arnio + pandas: Better Together

Roadmap

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes