Skip to main content

C++ accelerated CSV preprocessing and data cleaning for pandas

Project description

Arnio Logo

Arnio is an open-source C++ accelerated data preprocessing library
for Python. Built for speed and memory efficiency — and actively being optimized during GSSoC 2026.


PyPI Latest Release Python Versions PyPI Downloads
CI Build Wheels Coverage Code style: black License GSSoC


The ProblemThe SolutionBenchmarksQuickstart


Pandas is incredible for analysis. It is notoriously slow and memory-hungry for ingesting and cleaning raw CSVs.
Arnio exists to do exactly one thing: intercept your messy CSVs, clean them natively in C++, and hand you a pristine Pandas DataFrame in half the time.

🧨 The Problem

Every data project starts the same way. You load a CSV. It crashes your RAM. You load it again in chunks. You find random nulls, weird capitalization, and trailing whitespaces. You write a 15-line script chaining .apply(), .dropna(), and .str.strip(). You copy-paste this script into your next 5 Jupyter notebooks.

It's slow. It's unreadable. It's error-prone.

✨ The Solution: Arnio

Arnio replaces your messy ingestion script with a high-performance, declarative pipeline powered by pybind11 and C++.

❌ The Old Way (Pandas) ⚡ The Arnio Way
Memory Spikes: Python loads the entire raw string file before casting. C++ Native: Parses and infers types directly into columnar memory.
Spaghetti Code: .apply() lambda functions scattered across cells. Declarative: A strict, readable list of cleaning steps.
Slow Execution: Python loops over strings to strip whitespaces. Blazing Fast: Cleaning primitives run at near metal speeds.

🚀 Getting Started

If you have Python 3.9+, you are 5 seconds away from faster data pipelines.

pip install arnio

The 3-Step Workflow

Drop Arnio into the very top of your Jupyter Notebook or Python script.

import arnio as ar

# 1. Load the raw file using the C++ core (no Python overhead)
frame = ar.read_csv("messy_sales_data.csv")

# 2. Define a strict, readable cleaning pipeline
clean_frame = ar.pipeline(frame, [
    ("strip_whitespace",),
    ("normalize_case", {"case_type": "lower"}),
    ("fill_nulls", {"value": 0.0, "subset": ["revenue"]}),
    ("drop_nulls",),
    ("drop_duplicates",),
])

# 3. Export to a clean pandas DataFrame and start your analysis!
df = ar.to_pandas(clean_frame)

# -> Now, use `df` exactly like you always have.

🏎️ Benchmarks

Tested on Ubuntu, Python 3.12, 1M row CSV.
Run make benchmark to reproduce on your machine.

The Goal: Provide a native C++ parser and cleaning engine that matches or beats pandas on memory, while eliminating the need for slow Python .apply() loops.

Metric pandas arnio v1.0.0 Note
Peak RAM 211MB 212MB Parity achieved. C++ native parsing prevents memory spikes.
Clean Syntax Python Loops Declarative No more spaghetti .apply() lambdas.
Execution Time 4.73s 5.75s Active optimization target for v0.2.0.

Current state: Arnio achieves memory parity with pandas while offering a much cleaner declarative API. Speed optimization is our primary focus for v0.2.0 — specifically, C++ implementations of drop_duplicates and strip_whitespace are currently unoptimized and are the main contributors to the execution time gap.

Help close the gap →

🔍 Want to peek at a massive file without loading it?

Arnio lets you instantly scan a massive CSV to infer its schema without loading the data into memory.

import arnio as ar

schema = ar.scan_csv("100GB_file.csv")
print(schema)
# {'id': 'INT64', 'name': 'STRING', 'is_active': 'BOOL'}

🛠️ What's Inside?

Arnio ships with a growing library of hyper-optimized C++ cleaning primitives:

  • drop_nulls: Rip out bad rows instantly.
  • fill_nulls: Patch holes with scalar values.
  • drop_duplicates: Deduplicate rows based on exact matches.
  • strip_whitespace: Trim invisible spaces from string columns.
  • normalize_case: Force upper or lower case instantly.
  • rename_columns & cast_types: Shape your data exactly how you need it.

🤝 Contributing

Arnio is a GSSoC 2026 project. We welcome contributors of all levels.

  • No C++ required: Add pipeline steps in pure Python
  • C++ contributors: Help optimize drop_duplicates and strip_whitespace
    — these are the current performance bottleneck
  • Docs & examples: Always needed

Read the Contribution Guide → | Browse open issues →


🗺️ Roadmap

Version Focus Status
v1.0.0 Stable release, cross-platform wheels, Google Colab support, CI/CD pipeline ✅ Released
v0.2.0 C++ pipeline optimization, speed parity with pandas 🔨 Active
v0.3.0 Chunked processing, Parquet/JSON support 📋 Planned

Stop fighting your data. Let Arnio clean it.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arnio-1.0.1.tar.gz (2.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

arnio-1.0.1-cp312-cp312-win_amd64.whl (174.8 kB view details)

Uploaded CPython 3.12Windows x86-64

arnio-1.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (221.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

arnio-1.0.1-cp312-cp312-macosx_11_0_arm64.whl (165.6 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

arnio-1.0.1-cp312-cp312-macosx_10_13_x86_64.whl (180.4 kB view details)

Uploaded CPython 3.12macOS 10.13+ x86-64

arnio-1.0.1-cp311-cp311-win_amd64.whl (172.8 kB view details)

Uploaded CPython 3.11Windows x86-64

arnio-1.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (220.6 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

arnio-1.0.1-cp311-cp311-macosx_11_0_arm64.whl (164.8 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

arnio-1.0.1-cp311-cp311-macosx_10_9_x86_64.whl (178.6 kB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

arnio-1.0.1-cp310-cp310-win_amd64.whl (171.9 kB view details)

Uploaded CPython 3.10Windows x86-64

arnio-1.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (219.1 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

arnio-1.0.1-cp310-cp310-macosx_11_0_arm64.whl (163.6 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

arnio-1.0.1-cp310-cp310-macosx_10_9_x86_64.whl (177.2 kB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

arnio-1.0.1-cp39-cp39-win_amd64.whl (178.5 kB view details)

Uploaded CPython 3.9Windows x86-64

arnio-1.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (219.4 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

arnio-1.0.1-cp39-cp39-macosx_11_0_arm64.whl (163.8 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

arnio-1.0.1-cp39-cp39-macosx_10_9_x86_64.whl (177.2 kB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

File details

Details for the file arnio-1.0.1.tar.gz.

File metadata

  • Download URL: arnio-1.0.1.tar.gz
  • Upload date:
  • Size: 2.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arnio-1.0.1.tar.gz
Algorithm Hash digest
SHA256 1f80cb4b8db3752135e1c476e1e9b8f239b471a7b2774d2159217e849c5ae73c
MD5 1ebd4b598fd52b59e6b7d392df48657f
BLAKE2b-256 67fa405212e231d17fd41c81156abd8ac2b1fc8c4320108cb4e9803d3a66cd26

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.1.tar.gz:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.1-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: arnio-1.0.1-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 174.8 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arnio-1.0.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 c6eb7511de08d16f7e8fc07df0f3a8c1ebecfe1bb9e4d8ae3cbb3da9fc0ce125
MD5 ce13d249b00d9c58d59feaf58f247ac1
BLAKE2b-256 7629d991ef577fda2eb72014aa39b9dba6e407f880ebfab525dae67033ebfd81

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.1-cp312-cp312-win_amd64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for arnio-1.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5d9de5480e9ed39663af92014b73e814c8c4d752f62c4488ef8960fce97a103d
MD5 2e4ed9f4c96bc792b3e6faa0e9029b7d
BLAKE2b-256 c341fe772c8b689458ec628d04c8ea4f5a2fd7c333cd423167bf863d1e4e1679

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for arnio-1.0.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ae3d452f772a4164ff19dce83707dad686fc1fbd1357a4ebe9d2223ae1f248ab
MD5 d21b74cbb845ed7cbbaff5a294068a68
BLAKE2b-256 6d61c11577c41b5840647290ef4043e5dd879fade66f1ea700afbc490dcd0fa2

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.1-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.1-cp312-cp312-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for arnio-1.0.1-cp312-cp312-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 dccc0538c6ee16e6a895b380121ae88727ae06549bdfe88bc02bf8d8e4c02675
MD5 93e2c0f6c66bd06d464edb20e16b8f36
BLAKE2b-256 86167f0de1339f6dfdaab94267e89e23c0c6056776137bd4118320e6d38bb0d9

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.1-cp312-cp312-macosx_10_13_x86_64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.1-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: arnio-1.0.1-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 172.8 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arnio-1.0.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 dfda1ec8a23af49b6533a861c87fd094b0208bf7f82c600ce81167ab1f471f98
MD5 d7d1733002f81ec5d1effd1d76a52f7b
BLAKE2b-256 723d8a690f123308474458928687b0b558966055e07bf5bbf115a4218e9d3b07

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.1-cp311-cp311-win_amd64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for arnio-1.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ce751ed4da39a4cb1e0a00e8f3ee0f6dbcf94b810a8800627675c6d465284722
MD5 b7bb1d8fe62ed4ab7336fd00eb3ade83
BLAKE2b-256 f7fec121afe1abb13c4ee71a09c1057c01b56f4b2a7a276ddbe6c7d445ba2e34

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for arnio-1.0.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f5106a3381498930b95eab27a144b808d1b6d37cc9f39311120d89829c4465c2
MD5 400d4a5337533a775675a37d77c51a99
BLAKE2b-256 67e897463a888dc60617d94f14065240cce241507b0790b7287958b71c026acb

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.1-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.1-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for arnio-1.0.1-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 e420b049fac2d426d68929002595a8da227a26986f084f03c3f72783d4570c4b
MD5 e2d8f125f67efe5f1473e8563e23a5a0
BLAKE2b-256 e61f3faf2fdcc1cbf4865a66c091cd2f736d3b8b6c1ca18028ef5d9691328811

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.1-cp311-cp311-macosx_10_9_x86_64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.1-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: arnio-1.0.1-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 171.9 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arnio-1.0.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 2259e6aa33738ea39dc53dc702103d8b575b5539ad9e9e6d03d5d41589dca3be
MD5 59ea85bfbf3ae63f2b3dea639e614de7
BLAKE2b-256 f891252df9acad613e8961c24b58fa38feea0cd099e10626f40519a83a8daacf

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.1-cp310-cp310-win_amd64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for arnio-1.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6fbad803bcb525705c27d6a838cecc81fc730fb23752f0924a8ffa650ea64fdc
MD5 5ecfe8ed1697c2b5bd9f2bd05a7a7fd9
BLAKE2b-256 a050523fe136cfe21d8bebdc514f02fb431dc82559d2e3402c4bc2f8b7f10ac5

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for arnio-1.0.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5dc469cf5cb77e983fa2589745884ba92695810b49aa3b5ef506711aecbdb2d8
MD5 6484f8d83ed531dcebde918bfc0bfa09
BLAKE2b-256 1ac3e883b63c3f0e28a071bf5d6a53baefc2c492c063539ce56565ac9804a1c9

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.1-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.1-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for arnio-1.0.1-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 11eeaa7be02150e4d7c2c3367978d849e98449fe7ca1451d35ac4a5ea6b2fa17
MD5 13f165631e50151a7969d54465ba4124
BLAKE2b-256 691832d1f3c4d47e8e9e3bf3208c37cca14c8f566383893fd6cbd5ffd137af96

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.1-cp310-cp310-macosx_10_9_x86_64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.1-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: arnio-1.0.1-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 178.5 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arnio-1.0.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 36ce1b78d95be81e925401313601e444be928612695e21300bd3d571b4ca7806
MD5 37078ced9a396fa632d058a66d975162
BLAKE2b-256 3b55c9045ed938a28b67ab630d04e72e5d27e6f3eef9d24469df765c54338ff1

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.1-cp39-cp39-win_amd64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for arnio-1.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d1563c287ca9f9631fd0eb539ca522510b260ec3a588d8592d67d8960b6d6184
MD5 25942290bed85debee7b5cf5b1fad44b
BLAKE2b-256 9a7d3354c4a9fcec62e777ac17b80c9ef0ad374ae4eb7230045fe7e9dc3739e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

  • Download URL: arnio-1.0.1-cp39-cp39-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 163.8 kB
  • Tags: CPython 3.9, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arnio-1.0.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0f353522a7efc11429e00af974b1665c68d995ba0fe32d26d5f647997f839a9a
MD5 77a5b108e6e28022fcb510a4c6bceb38
BLAKE2b-256 ab70b22df2e90a2b3ab4900aff03b52484c2d8c13a45fc82737d879ab10f5d33

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.1-cp39-cp39-macosx_11_0_arm64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.1-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for arnio-1.0.1-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 7348ce7d851d94fbac493c665d374ca70aefdf30ee700fc02cd1b70d79b78b2b
MD5 c0ebde4f9b8e249d454d1d2b74b2a2a0
BLAKE2b-256 fb3e09186d1fc0fe5fe5feec615934a730d437c84f8f7fd51ab8efa8aa89e707

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.1-cp39-cp39-macosx_10_9_x86_64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page