Skip to main content

C++ accelerated CSV preprocessing and data cleaning for pandas

Project description


⚡ arnio

Arnio is an open-source C++ accelerated data preprocessing library
for Python. Built for speed and memory efficiency — and actively being optimized during GSSoC 2026.

CI PyPI Python License

The ProblemThe SolutionBenchmarksQuickstart


Pandas is incredible for analysis. It is notoriously slow and memory-hungry for ingesting and cleaning raw CSVs.
Arnio exists to do exactly one thing: intercept your messy CSVs, clean them natively in C++, and hand you a pristine Pandas DataFrame in half the time.

arnio demo

🧨 The Problem

Every data project starts the same way. You load a CSV. It crashes your RAM. You load it again in chunks. You find random nulls, weird capitalization, and trailing whitespaces. You write a 15-line script chaining .apply(), .dropna(), and .str.strip(). You copy-paste this script into your next 5 Jupyter notebooks.

It's slow. It's unreadable. It's error-prone.

✨ The Solution: Arnio

Arnio replaces your messy ingestion script with a high-performance, declarative pipeline powered by pybind11 and C++.

❌ The Old Way (Pandas) ⚡ The Arnio Way
Memory Spikes: Python loads the entire raw string file before casting. C++ Native: Parses and infers types directly into columnar memory.
Spaghetti Code: .apply() lambda functions scattered across cells. Declarative: A strict, readable list of cleaning steps.
Slow Execution: Python loops over strings to strip whitespaces. Blazing Fast: Cleaning primitives run at near metal speeds.

🚀 Getting Started

If you have Python 3.9+, you are 5 seconds away from faster data pipelines.

pip install arnio

The 3-Step Workflow

Drop Arnio into the very top of your Jupyter Notebook or Python script.

import arnio as ar

# 1. Load the raw file using the C++ core (no Python overhead)
frame = ar.read_csv("messy_sales_data.csv")

# 2. Define a strict, readable cleaning pipeline
clean_frame = ar.pipeline(frame, [
    ("strip_whitespace",),
    ("normalize_case", {"case_type": "lower"}),
    ("fill_nulls", {"value": 0.0, "subset": ["revenue"]}),
    ("drop_nulls",),
    ("drop_duplicates",),
])

# 3. Export to a clean pandas DataFrame and start your analysis!
df = ar.to_pandas(clean_frame)

# -> Now, use `df` exactly like you always have.

🏎️ Benchmarks

Tested on Ubuntu, Python 3.12, 1M row CSV.
Run make benchmark to reproduce on your machine.

Metric pandas arnio v1.0.0
Execution Time 4.73s 5.75s
Peak RAM 211MB 212MB

Current state: arnio's C++ CSV reader matches pandas on memory.
Speed parity is the active engineering goal for v0.2.0 — specifically
drop_duplicates and strip_whitespace are unoptimized C++ and are
the primary contributors to the gap.

Help close the gap →

🔍 Want to peek at a massive file without loading it?

Arnio lets you instantly scan a massive CSV to infer its schema without loading the data into memory.

import arnio as ar

schema = ar.scan_csv("100GB_file.csv")
print(schema) 
# {'id': 'INT64', 'name': 'STRING', 'is_active': 'BOOL'}

🛠️ What's Inside?

Arnio ships with a growing library of hyper-optimized C++ cleaning primitives:

  • drop_nulls: Rip out bad rows instantly.
  • fill_nulls: Patch holes with scalar values.
  • drop_duplicates: Deduplicate rows based on exact matches.
  • strip_whitespace: Trim invisible spaces from string columns.
  • normalize_case: Force upper or lower case instantly.
  • rename_columns & cast_types: Shape your data exactly how you need it.

🤝 Contributing

Arnio is a GSSoC 2026 project. We welcome contributors of all levels.

  • No C++ required: Add pipeline steps in pure Python
  • C++ contributors: Help optimize drop_duplicates and strip_whitespace
    — these are the current performance bottleneck
  • Docs & examples: Always needed

Read the Contribution Guide → | Browse open issues →


🗺️ Roadmap

Version Focus Status
v1.0.0 Stable release, cross-platform wheels, Google Colab support, CI/CD pipeline ✅ Released
v0.2.0 C++ pipeline optimization, speed parity with pandas 🔨 Active
v0.3.0 Chunked processing, Parquet/JSON support 📋 Planned

Stop fighting your data. Let Arnio clean it.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arnio-1.0.0.tar.gz (13.8 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

arnio-1.0.0-cp312-cp312-win_amd64.whl (172.4 kB view details)

Uploaded CPython 3.12Windows x86-64

arnio-1.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (219.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

arnio-1.0.0-cp312-cp312-macosx_11_0_arm64.whl (162.9 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

arnio-1.0.0-cp312-cp312-macosx_10_13_x86_64.whl (177.6 kB view details)

Uploaded CPython 3.12macOS 10.13+ x86-64

arnio-1.0.0-cp311-cp311-win_amd64.whl (170.4 kB view details)

Uploaded CPython 3.11Windows x86-64

arnio-1.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (218.0 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

arnio-1.0.0-cp311-cp311-macosx_11_0_arm64.whl (162.2 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

arnio-1.0.0-cp311-cp311-macosx_10_9_x86_64.whl (175.7 kB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

arnio-1.0.0-cp310-cp310-win_amd64.whl (169.4 kB view details)

Uploaded CPython 3.10Windows x86-64

arnio-1.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (216.5 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

arnio-1.0.0-cp310-cp310-macosx_11_0_arm64.whl (161.0 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

arnio-1.0.0-cp310-cp310-macosx_10_9_x86_64.whl (174.4 kB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

arnio-1.0.0-cp39-cp39-win_amd64.whl (176.1 kB view details)

Uploaded CPython 3.9Windows x86-64

arnio-1.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (216.7 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

arnio-1.0.0-cp39-cp39-macosx_11_0_arm64.whl (161.1 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

arnio-1.0.0-cp39-cp39-macosx_10_9_x86_64.whl (174.4 kB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

File details

Details for the file arnio-1.0.0.tar.gz.

File metadata

  • Download URL: arnio-1.0.0.tar.gz
  • Upload date:
  • Size: 13.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arnio-1.0.0.tar.gz
Algorithm Hash digest
SHA256 fb2c66c156b2148acf52936e0ac0e224459e5d7917bf2f8ddc54e133535e4f1d
MD5 2a02df0d70825dddfa510499c9fbe440
BLAKE2b-256 b4b2cc67fd7d93cd86d4261749c352ba1c29234c339c1e78620d5301b46ef698

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.0.tar.gz:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: arnio-1.0.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 172.4 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arnio-1.0.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 9f0e8e8d5918c36f26da9b494b368ddf5942d8b43c175aab7cde4a18a7d54e6d
MD5 f9cfa2f9778b1459ddd89cc0c89fdd32
BLAKE2b-256 e86103449ac6c4b0a313f6dc4c22af908af7ddc43d9db54c290991d7a6a478dc

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.0-cp312-cp312-win_amd64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for arnio-1.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c146f6f8e9eb68007a4d30120430120c6c197b414e6617ee0c58748f2e173c29
MD5 5af510f504eab27dd1ac895a7b4964c5
BLAKE2b-256 955a5207a6b35f03fc41be5ff7199e304727a2e343a35b01d7eb05f6dfdcad1d

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for arnio-1.0.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 076263cbcaf40c5435dfd08315fcc2a465c9aee25804175eb2aa631d4e85c7be
MD5 0515743a5990f1a809ed5a89363eee4c
BLAKE2b-256 dc5fa1fa2a1da2d73870218bc210fc168ffbf23622ae440797813b958d9d1552

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.0-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.0-cp312-cp312-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for arnio-1.0.0-cp312-cp312-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 8ea2ac77aa61dae0f8bf2efa69e7d9f298351c66a46d093868089be5766605c8
MD5 e4fdb3dfad52b3bc51a9e2b7c6462d8c
BLAKE2b-256 18a605fcbb118923eba4c092b95f30722f0837a7d7bc23d488cada09e908d176

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.0-cp312-cp312-macosx_10_13_x86_64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: arnio-1.0.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 170.4 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arnio-1.0.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 a42ace969ec2934908d8faaf1a8e301c79e1871012fbfce5e74ad89468f56de7
MD5 8583b8f20cdfbd8d98f247eea2588ce2
BLAKE2b-256 ebef3361d4c780bb5371b95658ebd0dea7f763c1e49a0ddb52261df282e84077

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.0-cp311-cp311-win_amd64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for arnio-1.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9fa5e593d7a07660e2895e8ce16b8ef7b92abbebb5c7865f7506f802d5085806
MD5 4033a0243a1ac459aa9e8b2c1a29d1e3
BLAKE2b-256 e0b40ed9a3f7982c78a572885e45aae4ecb782f8f5bb57b3a2d194b9af76cb46

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for arnio-1.0.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 29ae81e566b9f15c860144eb2336f7def588c7b3cf1f745c2bc0d198da3c6d77
MD5 60d27b20b26febc71b2ec2524f0af913
BLAKE2b-256 603f301b9e664e79540b18c6aa4b36bf4aabae18b30df8bb5dc52d34d9452c1f

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.0-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.0-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for arnio-1.0.0-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 0426c8a4f89d28681878fa2db62d9791fd1e7c9c16094b4164e35b7a097ba593
MD5 35e8bda1198618ad351a702c27a0948c
BLAKE2b-256 78da97dfd8ff337f83e7d056ea4a2c6b52ae171b50af82ea808c049914c0ed61

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.0-cp311-cp311-macosx_10_9_x86_64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: arnio-1.0.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 169.4 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arnio-1.0.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 5739f1965787b5159d901ffba889762a5d88543c488b5b25aaa5c24db084155f
MD5 9553995122e89bbebd1b116c6fd8f95b
BLAKE2b-256 6cbe5a25bec0805a21fa9c74ba8d472e76f5e052a84acda4ca1d91e22586a0a0

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.0-cp310-cp310-win_amd64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for arnio-1.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2b35aa2843494243c32153cd814b608acf468a42094605c56c7abb14c670c452
MD5 ed0e8262e1ee5365980a8d4638690990
BLAKE2b-256 c4145579907a3b4481095ddc231e0166df8b83ea9affb2bf3ca0bcec305514f2

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for arnio-1.0.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9035c1e1bd559c4057d04a588c6061322a392d0a5daafce76d2527c2da068437
MD5 e0ef92db633dd0a519695766ebe18b14
BLAKE2b-256 d63ddddcf40a7bf9082bd64194ca96cef12a86bcc1d5403658181dd62db06428

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.0-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.0-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for arnio-1.0.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 3ea7175ff54afd2c51a801df1b96f9a2dea1f8ae89684833a1ddee1f8c3e9429
MD5 91e33454199fa3694a5e06a5323afb66
BLAKE2b-256 3a393e799f6424321ff10c25ea266dc08f9796c9beecb208eee26b84fefa398a

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.0-cp310-cp310-macosx_10_9_x86_64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: arnio-1.0.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 176.1 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arnio-1.0.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 7a8e3ca95677e0d954da9de89fc2ebd74493de1e6255db23eb85a86682276a48
MD5 9730a22dc1870d28c25c122c3a8cd42c
BLAKE2b-256 f11332471797a0ed4e1ad17de368121a510f0882be4af457bc2c7a8528439974

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.0-cp39-cp39-win_amd64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for arnio-1.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4ffb998b28cee60c74a741fe9d417aa7fdc99df80bd6c2eed2e467ee5f54afe0
MD5 65b2bbb3cba20377f2f26038cc2a40cc
BLAKE2b-256 43da4c7be0b2cc48c1796cf4fd2e0ef34935dcfa97b1430793fff39309df42a7

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

  • Download URL: arnio-1.0.0-cp39-cp39-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 161.1 kB
  • Tags: CPython 3.9, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arnio-1.0.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5e15b4fb5392ca036c9498dad200897912022159434e6a41e8ca187913504d01
MD5 eea976702f620f0aa08ea1da03370c69
BLAKE2b-256 426c0474fb6d159de14e19c0ea5e7d815ef991727da5b6ce5a9d2db06ca72dbb

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.0-cp39-cp39-macosx_11_0_arm64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arnio-1.0.0-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for arnio-1.0.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 56b5546babca1946151f412371b701d281cdafc55ab0bd2137691301da099bfa
MD5 7427a78bec7b5f80c78e397f7e8dda86
BLAKE2b-256 ed589aea74c671332c8bfb7304e3f5614b717f97e8edbdf084357b0bab7a80b3

See more details on using hashes here.

Provenance

The following attestation bundles were made for arnio-1.0.0-cp39-cp39-macosx_10_9_x86_64.whl:

Publisher: build_wheels.yml on im-anishraj/arnio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page