Skip to main content

Data augmentation library with Rust-accelerated operations

Project description

Additory Rust Python Bindings

High-performance Rust implementations of Additory's data augmentation functions with Python bindings.

Performance: 2-5x faster than pure Python
Compatibility: 100% API compatible with Additory v0.1.1
Python Support: Python 3.8+ (abi3)

Features

  • Zero-copy DataFrame transfer via Apache Arrow IPC
  • Automatic backend detection (pandas/polars)
  • Graceful fallback to pure Python if Rust unavailable
  • Memory efficient with minimal overhead
  • Type safe with Rust's ownership system

Installation

From PyPI (when published)

pip install additory-rust

From Source

# Install build dependencies
pip install maturin

# Navigate to bindings directory
cd rust-core/additory-py

# Build and install in development mode
maturin develop --release

# Or build a wheel
maturin build --release

Quick Start

The Rust bindings are automatically used when available through the Additory Python wrapper:

import polars as pl
from additory.functions.to import to

# Create sample data
orders = pl.DataFrame({
    "order_id": [1, 2, 3, 4],
    "product_id": [101, 102, 101, 103]
})

products = pl.DataFrame({
    "product_id": [101, 102, 103],
    "price": [10.0, 20.0, 15.0]
})

# Lookup operation (automatically uses Rust if available)
result = to(orders, from_df=products, bring="price", against="product_id")
print(result.df)

Output:

┌──────────┬────────────┬───────┐
│ order_id │ product_id │ price │
├──────────┼────────────┼───────┤
│ 1        │ 101        │ 10.0  │
│ 2        │ 102        │ 20.0  │
│ 3        │ 101        │ 10.0  │
│ 4        │ 103        │ 15.0  │
└──────────┴────────────┴───────┘

Supported Operations

Lookup

Join DataFrames and bring columns from reference data.

result = to(df, from_df=ref, bring=["col1", "col2"], against="id")

Merge

Combine DataFrames vertically or horizontally.

result = to(df1, from_df=df2, to="@merge", how="vertical")

Sort

Sort DataFrame by specified columns.

result = to(df, to="@sort", by="column", descending=False)

Summarize

Group and aggregate data.

result = to(df, to="@summarize", against="category", 
            aggregations={"sales": "sum", "quantity": "mean"})

Performance Benchmarks

Operation Rows Rust Time Python Time Speedup
Lookup 1k 0.020s 0.045s 2.3x
Lookup 10k 0.003s 0.012s 4.0x
Lookup 100k 0.015s 0.055s 3.7x
Sort 10k 0.002s 0.008s 4.0x
Sort 100k 0.020s 0.080s 4.0x

Pandas Compatibility

Works seamlessly with pandas DataFrames:

import pandas as pd

orders_pd = pd.DataFrame({
    "order_id": [1, 2, 3],
    "product_id": [101, 102, 101]
})

products_pd = pd.DataFrame({
    "product_id": [101, 102],
    "price": [10.0, 20.0]
})

# Automatic conversion and Rust acceleration
result = to(orders_pd, from_df=products_pd, bring="price", against="product_id")
# Result is also pandas DataFrame

Checking Rust Availability

from additory.functions.to import RUST_AVAILABLE

if RUST_AVAILABLE:
    print("🦀 Rust acceleration enabled!")
    import additory_rust
    print(f"Version: {additory_rust.__version__}")
else:
    print("🐍 Using pure Python implementation")

Direct Rust API (Advanced)

For advanced users who want to bypass the Python wrapper:

import additory_rust
import polars as pl
import io

# Convert DataFrame to Arrow IPC bytes
def df_to_bytes(df):
    buffer = io.BytesIO()
    df.write_ipc(buffer)
    return buffer.getvalue()

def bytes_to_df(data):
    buffer = io.BytesIO(data)
    return pl.read_ipc(buffer)

# Direct Rust call
df_bytes = df_to_bytes(orders)
from_df_bytes = df_to_bytes(products)

result_bytes = additory_rust.to_lookup(
    df_bytes, from_df_bytes, ["price"], ["product_id"]
)

result_df = bytes_to_df(result_bytes)

Architecture

Python DataFrame (pandas/polars)
        ↓
Arrow IPC Serialization
        ↓
Rust Processing (zero-copy)
        ↓
Arrow IPC Deserialization
        ↓
Python DataFrame (original type)

Error Handling

All Rust errors are converted to appropriate Python exceptions:

try:
    result = to(df, from_df=ref, bring="invalid_col", against="id")
except ValueError as e:
    print(e)
    # ValueError: Bring columns not found in reference DataFrame: ['invalid_col'].
    # Available columns: ['id', 'price', 'name']

Development

Building

# Debug build
maturin develop

# Release build (optimized)
maturin develop --release

# Build wheel
maturin build --release

Testing

# Rust unit tests
cargo test

# Python integration tests
python test_phase4_integration.py

# Performance benchmarks
python benchmark_rust_performance.py

Documentation

# Generate Rust docs
cargo doc --open

# View API documentation
cat API_DOCUMENTATION.md

# View usage examples
cat USAGE_EXAMPLES.md

Platform Support

Platform Architecture Status Wheel Size
Linux x86_64 ✅ Built 14MB
Linux aarch64 📝 Documented -
macOS x86_64 📝 Documented -
macOS aarch64 📝 Documented -
Windows x86_64 📝 Documented -

See MULTI_PLATFORM_BUILD_GUIDE.md for build instructions.

Troubleshooting

See TROUBLESHOOTING.md for common issues and solutions.

Documentation

Contributing

Contributions welcome! Please ensure:

  • All tests pass (cargo test and Python tests)
  • Code is formatted (cargo fmt)
  • No clippy warnings (cargo clippy)
  • Documentation is updated

License

MIT License - see LICENSE file for details

Version

Current Version: 0.2.0
Last Updated: 2025-02-04
Python Support: 3.8+
Polars Version: 0.44+

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

additory-0.1.1a5.tar.gz (182.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

additory-0.1.1a5-cp38-abi3-manylinux_2_34_x86_64.whl (14.9 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.34+ x86-64

File details

Details for the file additory-0.1.1a5.tar.gz.

File metadata

  • Download URL: additory-0.1.1a5.tar.gz
  • Upload date:
  • Size: 182.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for additory-0.1.1a5.tar.gz
Algorithm Hash digest
SHA256 3a85f016f41dded563128d4bc2ea59a7a1d337076dd9af0f924fea2d0fafd5e6
MD5 777cc0da70e031b541fadcba05e25e48
BLAKE2b-256 46680f9226fde3c156a934acd4fc91dfbe2c6fd4ded7b2b0ed2fafd05912c8f3

See more details on using hashes here.

File details

Details for the file additory-0.1.1a5-cp38-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for additory-0.1.1a5-cp38-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 ba2d59e7e65d6a24e6af586dbda36edeb0191809e77695bd4d07680e354eb8dd
MD5 fa800ac50c7caa714994d1d7f3a97e1f
BLAKE2b-256 fde4b08f9d4e5c3be7f5a43de3600a380bf376d8888b3c64b4b3838725e62fa2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page