Elegant data operations for DataFrames - add.to(), add.transform(), add.synthetic()
Project description
additory
Elegant data operations for DataFrames with Rust-powered performance
Overview
additory provides three simple, powerful functions for DataFrame operations:
add.to()- Add data FROM external sources (lookup, join, merge)add.transform()- Transform data WITHIN DataFrames (filter, calculate, aggregate)add.synthetic()- Create or augment with synthetic data
Built with Rust for performance, works seamlessly with pandas and polars.
Installation
# Basic installation (includes polars)
pip install additory
# With pandas support (recommended for pandas users)
pip install additory[pandas]
Requirements:
- Python 3.9 or higher
- polars 0.19.0+ (included automatically)
- pandas 1.5.0+ (optional, install with
pip install additory[pandas])
Note: additory uses polars internally for high-performance operations, but seamlessly works with pandas DataFrames through automatic conversion.
Quick Start
import pandas as pd
import additory as add
# Create sample data
customers = pd.DataFrame({
'id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Charlie']
})
orders = pd.DataFrame({
'id': [1, 2, 3],
'total': [100, 200, 150]
})
# Add data from another DataFrame
result = add.to(customers, fetch_from=orders, fetch=['total'], by='id')
# Result: customers with 'total' column added
# Transform data
result = add.transform('@calc', customers, expression='id * 10', as_='customer_code')
# Result: customers with calculated 'customer_code' column
# Generate synthetic data
synthetic = add.synthetic('@new', n=1000, fetch={
'age': 'normal(35, 10)',
'salary': 'lognormal(11, 0.5)'
})
# Result: 1000 rows of synthetic data
Works with Polars too! Simply replace import pandas as pd with import polars as pl and use pl.DataFrame() instead of pd.DataFrame().
Features
add.to() - Data Integration
Add columns from external sources with intelligent joining:
# Single column lookup
result = add.to(target, fetch_from=reference, fetch=['age'], by='id')
# Multiple columns
result = add.to(target, fetch_from=reference, fetch=['age', 'city'], by='id')
# Multiple join keys
result = add.to(target, fetch_from=reference, fetch=['amount'], by=('customer_id', 'date'))
# With aggregation
result = add.to(target, fetch_from=reference, fetch=['amount'], by='id',
strategy={'mode': 'sum'})
Supported modes:
- Lookup (default) - Add columns by joining on keys
- Aggregation - Sum, mean, first, last, concat, etc.
add.transform() - Data Transformation
Transform data with 10+ modes:
# Filter rows
result = add.transform('@filter', df, where='age > 25')
# Calculate new columns
result = add.transform('@calc', df, expression='price * quantity', as_='total')
# Sort data
result = add.transform('@sort', df, by='date', as_='asc')
# Aggregate data
result = add.transform('@aggregate', df, by='category',
fetch=['sales'], strategy={'mode': 'sum'})
# One-hot encoding
result = add.transform('@onehot', df, fetch=['category'])
# KNN imputation
result = add.transform('@knn', df, fetch=['age'], strategy={'k': 5})
Supported modes:
@filter- Filter rows and select columns@calc- Calculate new columns from expressions@sort- Sort by column(s)@aggregate- Group and aggregate@transpose- Transpose DataFrame@split- Split text columns@extract- Extract datetime components@onehot- One-hot encoding@label- Label encoding@harmonize- Unit conversions@knn- K-Nearest Neighbors imputation
add.synthetic() - Synthetic Data Generation
Create or augment data with statistical distributions:
# Create new synthetic data
result = add.synthetic('@new', n=1000, fetch={
'age': 'normal(50, 10)', # Normal distribution
'salary': 'lognormal(11, 0.5)', # Lognormal distribution
'score': 'uniform(0, 100)', # Uniform distribution
'status': 'categorical' # Categorical data
})
# Augment existing data
result = add.synthetic(df, n=500) # Add 500 synthetic rows
# Analyze data quality
analysis = add.synthetic('@analyze', df) # Get statistics
Supported distributions:
- Normal, Lognormal, Uniform, Exponential, Poisson, Binomial, Beta
- Categorical (simple and weighted)
- Sequences, Date/Time ranges
- Patterns (email, phone, UUID, regex)
Performance
additory is built with Rust for high performance:
- 3-5x faster than pure Python for transformations
- 5-10x faster for data joining operations
- 10-20x faster for synthetic data generation
Efficient memory usage with Arrow IPC serialization and vectorized operations.
DataFrame Support: Works with both pandas and polars DataFrames. Polars is required (installed automatically), and pandas DataFrames are seamlessly converted for high-performance operations.
Documentation
API Reference
add.to()
add.to(fetch_to, fetch_from, fetch, against, position=None, *,
strategy=None, join_type='lookup', as_type=None)
Parameters:
fetch_to: Target DataFramefetch_from: Reference DataFramefetch: Column(s) to add (str or list)against: Join key(s) (str or tuple)position: Column position (optional)strategy: Aggregation strategy (optional)join_type: Join type ('lookup', 'left', 'inner', 'outer')as_type: Output format ('polars', 'pandas', or None)
add.transform()
add.transform(mode, df, expression=None, *, where=None, by=None,
fetch=None, strategy=None, as_=None, fetch_at='end',
logging=False)
Parameters:
mode: Transform mode (e.g., '@calc', '@filter', '@sort')df: Input DataFrameexpression: Expression(s) for @calc modewhere: Filter conditionby: Grouping/sorting column(s)fetch: Column(s) to transformstrategy: Advanced optionsas_: New column name(s) or sort orderfetch_at: Position for new columnslogging: Enable detailed logging
add.synthetic()
add.synthetic(mode_or_df=None, df=None, **kwargs)
Parameters:
mode_or_df: Mode string ('@new', '@analyze') or DataFrame (for augment)df: DataFrame (for @analyze mode)n: Number of rows to generatefetch: Column specifications (for @new mode)strategy: Advanced optionslogging: Enable detailed logging
Examples
Data Integration Example
import pandas as pd
import additory as add
# Customer data
customers = pd.DataFrame({
'customer_id': [1, 2, 3, 4],
'name': ['Alice', 'Bob', 'Charlie', 'David']
})
# Order data
orders = pd.DataFrame({
'customer_id': [1, 1, 2, 3, 3, 3],
'amount': [100, 150, 200, 50, 75, 125]
})
# Add total order amount per customer
result = add.to(customers, fetch_from=orders,
fetch=['amount'], by='customer_id',
strategy={'mode': 'sum'})
print(result)
# customer_id | name | amount
# 1 | Alice | 250
# 2 | Bob | 200
# 3 | Charlie | 250
# 4 | David | NaN
Data Transformation Example
import pandas as pd
import additory as add
# Sales data
sales = pd.DataFrame({
'date': ['2024-01-01', '2024-01-02', '2024-01-03'],
'product': ['A', 'B', 'A'],
'quantity': [10, 15, 20],
'price': [100, 200, 100]
})
# Calculate total sales
result = add.transform('@calc', sales,
expression='quantity * price',
as_='total')
# Filter high-value sales
result = add.transform('@filter', result, where='total > 1500')
print(result)
# date | product | quantity | price | total
# 2024-01-02 | B | 15 | 200 | 3000
# 2024-01-03 | A | 20 | 100 | 2000
Synthetic Data Example
import additory as add
# Generate synthetic customer data
customers = add.synthetic('@new', n=10000, fetch={
'age': 'normal(35, 12)',
'income': 'lognormal(10.5, 0.5)',
'credit_score': 'uniform(300, 850)',
'segment': 'categorical'
})
# Analyze the generated data
analysis = add.synthetic('@analyze', customers)
print(analysis)
# Shows statistics: mean, std, min, max, null count, etc.
Note: Synthetic data is returned as a pandas DataFrame by default. Use as_type='polars' if you prefer polars.
Development Status
Current Version: 0.1.3a5 (Beta)
Status: Production-ready for core features
Test Coverage:
- 106 Rust tests passing (100%)
- Comprehensive integration tests
- All three functions fully tested
Roadmap:
- ✅ Core functionality (add.to, add.transform, add.synthetic)
- ✅ Rust-powered performance
- ✅ Polars and Pandas support
- ✅ Comprehensive test coverage
- 🔄 Additional transform modes
- 🔄 Enhanced expression parsing
- 🔄 Extended documentation
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
Repository: https://github.com/sekarkrishna/additory
License
MIT License - see LICENSE file for details
Author
Krishnamoorthy Sankaran
Email: krishnamoorthy.sankaran@sekrad.org
GitHub: https://github.com/sekarkrishna/additory
Support
- Issues: https://github.com/sekarkrishna/additory/issues
- Documentation: https://github.com/sekarkrishna/additory#readme
Acknowledgments
Built with:
- Rust - Performance and safety
- Polars - Fast DataFrame operations
- PyO3 - Python-Rust bindings
- Maturin - Build system
Made with ❤️ for the data science community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file additory-0.1.3a5-cp313-cp313-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: additory-0.1.3a5-cp313-cp313-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 11.6 MB
- Tags: CPython 3.13, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1628e5a91d4a88641a019bb111a11c71cbdd044e3b139c9ddeeb85d743f66142
|
|
| MD5 |
651cfce015e3b5a73bd8be1d627728e8
|
|
| BLAKE2b-256 |
794bd825d92ab2aea1572a5a2a4168739c835b2bfcc781fb829e40aa1d397006
|