Data augmentation library with Rust-accelerated operations
Project description
Additory Rust Python Bindings
High-performance Rust implementations of Additory's data augmentation functions with Python bindings.
Performance: 2-5x faster than pure Python
Compatibility: 100% API compatible with Additory v0.1.1
Python Support: Python 3.8+ (abi3)
Features
- ✅ Zero-copy DataFrame transfer via Apache Arrow IPC
- ✅ Automatic backend detection (pandas/polars)
- ✅ Graceful fallback to pure Python if Rust unavailable
- ✅ Memory efficient with minimal overhead
- ✅ Type safe with Rust's ownership system
Installation
From PyPI (when published)
pip install additory-rust
From Source
# Install build dependencies
pip install maturin
# Navigate to bindings directory
cd rust-core/additory-py
# Build and install in development mode
maturin develop --release
# Or build a wheel
maturin build --release
Quick Start
The Rust bindings are automatically used when available through the Additory Python wrapper:
import polars as pl
from additory.functions.to import to
# Create sample data
orders = pl.DataFrame({
"order_id": [1, 2, 3, 4],
"product_id": [101, 102, 101, 103]
})
products = pl.DataFrame({
"product_id": [101, 102, 103],
"price": [10.0, 20.0, 15.0]
})
# Lookup operation (automatically uses Rust if available)
result = to(orders, from_df=products, bring="price", against="product_id")
print(result.df)
Output:
┌──────────┬────────────┬───────┐
│ order_id │ product_id │ price │
├──────────┼────────────┼───────┤
│ 1 │ 101 │ 10.0 │
│ 2 │ 102 │ 20.0 │
│ 3 │ 101 │ 10.0 │
│ 4 │ 103 │ 15.0 │
└──────────┴────────────┴───────┘
Supported Operations
Lookup
Join DataFrames and bring columns from reference data.
result = to(df, from_df=ref, bring=["col1", "col2"], against="id")
Merge
Combine DataFrames vertically or horizontally.
result = to(df1, from_df=df2, to="@merge", how="vertical")
Sort
Sort DataFrame by specified columns.
result = to(df, to="@sort", by="column", descending=False)
Summarize
Group and aggregate data.
result = to(df, to="@summarize", against="category",
aggregations={"sales": "sum", "quantity": "mean"})
Performance Benchmarks
| Operation | Rows | Rust Time | Python Time | Speedup |
|---|---|---|---|---|
| Lookup | 1k | 0.020s | 0.045s | 2.3x |
| Lookup | 10k | 0.003s | 0.012s | 4.0x |
| Lookup | 100k | 0.015s | 0.055s | 3.7x |
| Sort | 10k | 0.002s | 0.008s | 4.0x |
| Sort | 100k | 0.020s | 0.080s | 4.0x |
Pandas Compatibility
Works seamlessly with pandas DataFrames:
import pandas as pd
orders_pd = pd.DataFrame({
"order_id": [1, 2, 3],
"product_id": [101, 102, 101]
})
products_pd = pd.DataFrame({
"product_id": [101, 102],
"price": [10.0, 20.0]
})
# Automatic conversion and Rust acceleration
result = to(orders_pd, from_df=products_pd, bring="price", against="product_id")
# Result is also pandas DataFrame
Checking Rust Availability
from additory.functions.to import RUST_AVAILABLE
if RUST_AVAILABLE:
print("🦀 Rust acceleration enabled!")
import additory_rust
print(f"Version: {additory_rust.__version__}")
else:
print("🐍 Using pure Python implementation")
Direct Rust API (Advanced)
For advanced users who want to bypass the Python wrapper:
import additory_rust
import polars as pl
import io
# Convert DataFrame to Arrow IPC bytes
def df_to_bytes(df):
buffer = io.BytesIO()
df.write_ipc(buffer)
return buffer.getvalue()
def bytes_to_df(data):
buffer = io.BytesIO(data)
return pl.read_ipc(buffer)
# Direct Rust call
df_bytes = df_to_bytes(orders)
from_df_bytes = df_to_bytes(products)
result_bytes = additory_rust.to_lookup(
df_bytes, from_df_bytes, ["price"], ["product_id"]
)
result_df = bytes_to_df(result_bytes)
Architecture
Python DataFrame (pandas/polars)
↓
Arrow IPC Serialization
↓
Rust Processing (zero-copy)
↓
Arrow IPC Deserialization
↓
Python DataFrame (original type)
Error Handling
All Rust errors are converted to appropriate Python exceptions:
try:
result = to(df, from_df=ref, bring="invalid_col", against="id")
except ValueError as e:
print(e)
# ValueError: Bring columns not found in reference DataFrame: ['invalid_col'].
# Available columns: ['id', 'price', 'name']
Development
Building
# Debug build
maturin develop
# Release build (optimized)
maturin develop --release
# Build wheel
maturin build --release
Testing
# Rust unit tests
cargo test
# Python integration tests
python test_phase4_integration.py
# Performance benchmarks
python benchmark_rust_performance.py
Documentation
# Generate Rust docs
cargo doc --open
# View API documentation
cat API_DOCUMENTATION.md
# View usage examples
cat USAGE_EXAMPLES.md
Platform Support
| Platform | Architecture | Status | Wheel Size |
|---|---|---|---|
| Linux | x86_64 | ✅ Built | 14MB |
| Linux | aarch64 | 📝 Documented | - |
| macOS | x86_64 | 📝 Documented | - |
| macOS | aarch64 | 📝 Documented | - |
| Windows | x86_64 | 📝 Documented | - |
See MULTI_PLATFORM_BUILD_GUIDE.md for build instructions.
Troubleshooting
See TROUBLESHOOTING.md for common issues and solutions.
Documentation
- API Documentation - Detailed function reference
- Usage Examples - Real-world examples
- Multi-Platform Build Guide - Building for different platforms
- Troubleshooting Guide - Common issues and solutions
Contributing
Contributions welcome! Please ensure:
- All tests pass (
cargo testand Python tests) - Code is formatted (
cargo fmt) - No clippy warnings (
cargo clippy) - Documentation is updated
License
MIT License - see LICENSE file for details
Version
Current Version: 0.2.0
Last Updated: 2025-02-04
Python Support: 3.8+
Polars Version: 0.44+
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file additory-0.1.1a5.tar.gz.
File metadata
- Download URL: additory-0.1.1a5.tar.gz
- Upload date:
- Size: 182.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a85f016f41dded563128d4bc2ea59a7a1d337076dd9af0f924fea2d0fafd5e6
|
|
| MD5 |
777cc0da70e031b541fadcba05e25e48
|
|
| BLAKE2b-256 |
46680f9226fde3c156a934acd4fc91dfbe2c6fd4ded7b2b0ed2fafd05912c8f3
|
File details
Details for the file additory-0.1.1a5-cp38-abi3-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: additory-0.1.1a5-cp38-abi3-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 14.9 MB
- Tags: CPython 3.8+, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba2d59e7e65d6a24e6af586dbda36edeb0191809e77695bd4d07680e354eb8dd
|
|
| MD5 |
fa800ac50c7caa714994d1d7f3a97e1f
|
|
| BLAKE2b-256 |
fde4b08f9d4e5c3be7f5a43de3600a380bf376d8888b3c64b4b3838725e62fa2
|