Elegant data operations for DataFrames - add.to(), add.transform(), add.synthetic()

These details have not been verified by PyPI

Project links

Project description

additory

Elegant data operations for DataFrames with Rust-powered performance

Overview

additory provides three simple, powerful functions for DataFrame operations:

add.to() - Add data FROM external sources (lookup, join, merge)
add.transform() - Transform data WITHIN DataFrames (filter, calculate, aggregate)
add.synthetic() - Create or augment with synthetic data

Built with Rust for performance, works seamlessly with pandas and polars.

Installation

# Basic installation (includes polars)
pip install additory

# With pandas support (recommended for pandas users)
pip install additory[pandas]

Requirements:

Python 3.9 or higher
polars 0.19.0+ (included automatically)
pandas 1.5.0+ (optional, install with pip install additory[pandas])

Note: additory uses polars internally for high-performance operations, but seamlessly works with pandas DataFrames through automatic conversion.

Quick Start

import pandas as pd
import additory as add

# Create sample data
customers = pd.DataFrame({
    'id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie']
})

orders = pd.DataFrame({
    'id': [1, 2, 3],
    'total': [100, 200, 150]
})

# Add data from another DataFrame
result = add.to(customers, fetch_from=orders, fetch=['total'], by='id')
# Result: customers with 'total' column added

# Transform data
result = add.transform('@calc', customers, expression='id * 10', as_='customer_code')
# Result: customers with calculated 'customer_code' column

# Generate synthetic data
synthetic = add.synthetic('@new', n=1000, fetch={
    'age': 'normal(35, 10)',
    'salary': 'lognormal(11, 0.5)'
})
# Result: 1000 rows of synthetic data

Works with Polars too! Simply replace import pandas as pd with import polars as pl and use pl.DataFrame() instead of pd.DataFrame().

Features

add.to() - Data Integration

Add columns from external sources with intelligent joining:

# Single column lookup
result = add.to(target, fetch_from=reference, fetch=['age'], by='id')

# Multiple columns
result = add.to(target, fetch_from=reference, fetch=['age', 'city'], by='id')

# Multiple join keys
result = add.to(target, fetch_from=reference, fetch=['amount'], by=('customer_id', 'date'))

# With aggregation
result = add.to(target, fetch_from=reference, fetch=['amount'], by='id',
                strategy={'mode': 'sum'})

Supported modes:

Lookup (default) - Add columns by joining on keys
Aggregation - Sum, mean, first, last, concat, etc.

add.transform() - Data Transformation

Transform data with 10+ modes:

# Filter rows
result = add.transform('@filter', df, where='age > 25')

# Calculate new columns
result = add.transform('@calc', df, expression='price * quantity', as_='total')

# Sort data
result = add.transform('@sort', df, by='date', as_='asc')

# Aggregate data
result = add.transform('@aggregate', df, by='category', 
                       fetch=['sales'], strategy={'mode': 'sum'})

# One-hot encoding
result = add.transform('@onehot', df, fetch=['category'])

# KNN imputation
result = add.transform('@knn', df, fetch=['age'], strategy={'k': 5})

Supported modes:

@filter - Filter rows and select columns
@calc - Calculate new columns from expressions
@sort - Sort by column(s)
@aggregate - Group and aggregate
@transpose - Transpose DataFrame
@split - Split text columns
@extract - Extract datetime components
@onehot - One-hot encoding
@label - Label encoding
@harmonize - Unit conversions
@knn - K-Nearest Neighbors imputation

add.synthetic() - Synthetic Data Generation

Create or augment data with statistical distributions:

# Create new synthetic data
result = add.synthetic('@new', n=1000, fetch={
    'age': 'normal(50, 10)',           # Normal distribution
    'salary': 'lognormal(11, 0.5)',    # Lognormal distribution
    'score': 'uniform(0, 100)',        # Uniform distribution
    'status': 'categorical'             # Categorical data
})

# Augment existing data
result = add.synthetic(df, n=500)  # Add 500 synthetic rows

# Analyze data quality
analysis = add.synthetic('@analyze', df)  # Get statistics

Supported distributions:

Normal, Lognormal, Uniform, Exponential, Poisson, Binomial, Beta
Categorical (simple and weighted)
Sequences, Date/Time ranges
Patterns (email, phone, UUID, regex)

Performance

additory is built with Rust for high performance:

3-5x faster than pure Python for transformations
5-10x faster for data joining operations
10-20x faster for synthetic data generation

Efficient memory usage with Arrow IPC serialization and vectorized operations.

DataFrame Support: Works with both pandas and polars DataFrames. Polars is required (installed automatically), and pandas DataFrames are seamlessly converted for high-performance operations.

Documentation

API Reference

add.to()

add.to(fetch_to, fetch_from, fetch, against, position=None, *, 
       strategy=None, join_type='lookup', as_type=None)

Parameters:

fetch_to: Target DataFrame
fetch_from: Reference DataFrame
fetch: Column(s) to add (str or list)
against: Join key(s) (str or tuple)
position: Column position (optional)
strategy: Aggregation strategy (optional)
join_type: Join type ('lookup', 'left', 'inner', 'outer')
as_type: Output format ('polars', 'pandas', or None)

add.transform()

add.transform(mode, df, expression=None, *, where=None, by=None, 
              fetch=None, strategy=None, as_=None, fetch_at='end', 
              logging=False)

Parameters:

mode: Transform mode (e.g., '@calc', '@filter', '@sort')
df: Input DataFrame
expression: Expression(s) for @calc mode
where: Filter condition
by: Grouping/sorting column(s)
fetch: Column(s) to transform
strategy: Advanced options
as_: New column name(s) or sort order
fetch_at: Position for new columns
logging: Enable detailed logging

add.synthetic()

add.synthetic(mode_or_df=None, df=None, **kwargs)

Parameters:

mode_or_df: Mode string ('@new', '@analyze') or DataFrame (for augment)
df: DataFrame (for @analyze mode)
n: Number of rows to generate
fetch: Column specifications (for @new mode)
strategy: Advanced options
logging: Enable detailed logging

Examples

Data Integration Example

import pandas as pd
import additory as add

# Customer data
customers = pd.DataFrame({
    'customer_id': [1, 2, 3, 4],
    'name': ['Alice', 'Bob', 'Charlie', 'David']
})

# Order data
orders = pd.DataFrame({
    'customer_id': [1, 1, 2, 3, 3, 3],
    'amount': [100, 150, 200, 50, 75, 125]
})

# Add total order amount per customer
result = add.to(customers, fetch_from=orders, 
                fetch=['amount'], by='customer_id',
                strategy={'mode': 'sum'})

print(result)
# customer_id | name    | amount
# 1           | Alice   | 250
# 2           | Bob     | 200
# 3           | Charlie | 250
# 4           | David   | NaN

Data Transformation Example

import pandas as pd
import additory as add

# Sales data
sales = pd.DataFrame({
    'date': ['2024-01-01', '2024-01-02', '2024-01-03'],
    'product': ['A', 'B', 'A'],
    'quantity': [10, 15, 20],
    'price': [100, 200, 100]
})

# Calculate total sales
result = add.transform('@calc', sales, 
                       expression='quantity * price', 
                       as_='total')

# Filter high-value sales
result = add.transform('@filter', result, where='total > 1500')

print(result)
# date       | product | quantity | price | total
# 2024-01-02 | B       | 15       | 200   | 3000
# 2024-01-03 | A       | 20       | 100   | 2000

Synthetic Data Example

import additory as add

# Generate synthetic customer data
customers = add.synthetic('@new', n=10000, fetch={
    'age': 'normal(35, 12)',
    'income': 'lognormal(10.5, 0.5)',
    'credit_score': 'uniform(300, 850)',
    'segment': 'categorical'
})

# Analyze the generated data
analysis = add.synthetic('@analyze', customers)
print(analysis)
# Shows statistics: mean, std, min, max, null count, etc.

Note: Synthetic data is returned as a pandas DataFrame by default. Use as_type='polars' if you prefer polars.

Development Status

Current Version: 0.1.3a5 (Beta)

Status: Production-ready for core features

Test Coverage:

106 Rust tests passing (100%)
Comprehensive integration tests
All three functions fully tested

Roadmap:

✅ Core functionality (add.to, add.transform, add.synthetic)
✅ Rust-powered performance
✅ Polars and Pandas support
✅ Comprehensive test coverage
🔄 Additional transform modes
🔄 Enhanced expression parsing
🔄 Extended documentation

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

Repository: https://github.com/sekarkrishna/additory

License

MIT License - see LICENSE file for details

Author

Krishnamoorthy Sankaran
Email: krishnamoorthy.sankaran@sekrad.org
GitHub: https://github.com/sekarkrishna/additory

Support

Issues: https://github.com/sekarkrishna/additory/issues
Documentation: https://github.com/sekarkrishna/additory#readme

Acknowledgments

Built with:

Rust - Performance and safety
Polars - Fast DataFrame operations
PyO3 - Python-Rust bindings
Maturin - Build system

Made with ❤️ for the data science community

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.3a11 pre-release

May 9, 2026

0.1.3a10 pre-release

Mar 12, 2026

0.1.3a9 pre-release

Mar 8, 2026

0.1.3a8 pre-release

Mar 3, 2026

0.1.3a7 pre-release

Feb 13, 2026

0.1.3a6 pre-release

Feb 13, 2026

This version

0.1.3a5 pre-release

Feb 13, 2026

0.1.3a4 pre-release

Feb 11, 2026

0.1.3a3 pre-release

Feb 9, 2026

0.1.3a2 pre-release

Feb 9, 2026

0.1.3a1 pre-release

Feb 9, 2026

0.1.2a1 pre-release

Feb 5, 2026

0.1.1a6 pre-release

Feb 4, 2026

0.1.1a5 pre-release

Feb 4, 2026

0.1.1a4 pre-release

Feb 4, 2026

0.1.1a3 pre-release

Feb 4, 2026

0.1.1a2 pre-release

Feb 4, 2026

0.1.1a1 pre-release

Feb 4, 2026

0.1.0a4 pre-release

Jan 28, 2026

0.1.0a3 pre-release

Jan 27, 2026

0.1.0a2 pre-release

Jan 25, 2026

0.1.0a1 pre-release

Jan 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

additory-0.1.3a5-cp313-cp313-manylinux_2_34_x86_64.whl (11.6 MB view details)

Uploaded Feb 13, 2026 CPython 3.13manylinux: glibc 2.34+ x86-64

File details

Details for the file additory-0.1.3a5-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

Download URL: additory-0.1.3a5-cp313-cp313-manylinux_2_34_x86_64.whl
Upload date: Feb 13, 2026
Size: 11.6 MB
Tags: CPython 3.13, manylinux: glibc 2.34+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for additory-0.1.3a5-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm	Hash digest
SHA256	`1628e5a91d4a88641a019bb111a11c71cbdd044e3b139c9ddeeb85d743f66142`
MD5	`651cfce015e3b5a73bd8be1d627728e8`
BLAKE2b-256	`794bd825d92ab2aea1572a5a2a4168739c835b2bfcc781fb829e40aa1d397006`

See more details on using hashes here.

additory 0.1.3a5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

additory

Overview

Installation

Quick Start

Features

add.to() - Data Integration

add.transform() - Data Transformation

add.synthetic() - Synthetic Data Generation

Performance

Documentation

API Reference

add.to()

add.transform()

add.synthetic()

Examples

Data Integration Example

Data Transformation Example

Synthetic Data Example

Development Status

Contributing

License

Author

Support

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes