Elegant data operations for DataFrames - add.to(), add.transform(), add.synthetic()

These details have not been verified by PyPI

Project links

Project description

additory

Elegant data operations for DataFrames

A Rust-powered Python library for intuitive data transformations, lookups, and synthetic data generation with Polars and Pandas.

Features

🔗 Intuitive Lookups - Add columns from external sources with simple syntax
⚡ Powerful Transforms - Calculate, filter, sort, aggregate with mode-based operations
🎲 Synthetic Data - Generate realistic test data or augment existing datasets
📊 Lineage Tracking - Track data transformations and view operation history
🔍 Data Scanning - Analyze data quality and inspect DataFrames
🚀 Rust Performance - Built with Rust for blazing-fast operations
🐼 Polars & Pandas - Works seamlessly with both DataFrame libraries
📚 Expression Library - 179 built-in expressions for medical, finance, physics, and more

Installation

pip install additory

Requirements:

Python 3.8+
Polars (required)
Pandas (optional)

Quick Start

import additory as add
import polars as pl

# Add data from external sources
orders = pl.DataFrame({'id': [1, 2], 'customer_id': [101, 102]})
customers = pl.DataFrame({'customer_id': [101, 102], 'name': ['Alice', 'Bob']})
result = add.to(orders, bring_from=customers, bring=['name'], against='customer_id')

# Transform data
df = pl.DataFrame({'x': [1, 2, 3]})
result = add.transform('@calc', df, strategy={'x_squared': 'x ** 2'})

# Generate synthetic data
result = add.synthetic('@new', n=100, strategy={'age': 'normal(40, 10)'})

Core Functions

add.to() - Add Data from External Sources

result = add.to(bring_to, bring_from=reference_df, bring=['column'], against='key',
                lineage=False)

Perfect for lookups and joins. Enable lineage=True to track data sources.

add.transform() - Transform Data

result = add.transform(mode, df, lineage=False, **parameters)

Available modes:

@calc - Calculate new columns with expressions
@filter - Filter rows and select columns
@sort - Sort data by columns
@aggregate - Group and aggregate data
@harmonize - Harmonize units (10 sub-modes)
@round - Round numbers (creates NEW columns)
@transpose - Transpose DataFrame
@extract - Extract patterns from text/dates
@onehotencode - One-hot encode categorical columns
@deduce - Fill missing values (7 methods)

add.synthetic() - Synthetic Data

result = add.synthetic(mode, df_or_n, lineage=False, **parameters)

Available modes:

@new - Create synthetic DataFrames from scratch
@augment - Add synthetic rows to existing data

add.scan() - Inspect and Analyze DataFrames

result = add.scan(mode, df)

Available modes:

@analyze / @analyse - Analyze data quality and distributions
@lineage - View lineage tracking reports (requires lineage=True in operations)

Strategy Parameter

The strategy parameter provides fine-grained control over operations in all three functions.

add.to() Strategy

Control aggregation, renaming, and positioning for brought columns:

Simple form (aggregation only):

strategy={'amount': 'sum', 'date': 'last'}

Complex form (full control):

strategy={
    'amount': {
        'mode': 'sum',
        'rename': 'total_spent',
        'position': 'after:customer_id'
    }
}

Aggregation modes: first, last, sum, count, average, min, max, concat, concat[sep], most_common, least_common, median, std, variance, unique_count

add.transform() Strategy

Mode-specific configuration:

@calc - Expressions for new columns:

strategy={'total': 'price * quantity', 'discount': 'total * 0.1'}

@sort - Sort order:

strategy={'order': 'desc'}  # or 'asc'

@aggregate - Aggregation functions:

strategy={'amount': 'sum', 'count': 'count'}

@round - Custom naming and positioning:

strategy={
    'price': {'name': 'price_clean', 'position': 'after:price'}
}

@deduce - KNN parameters:

strategy={'k': 5, 'weights': 'distance'}

add.synthetic() Strategy

Column generation specifications:

Simple form:

strategy={'id': 'increment', 'age': 'normal(40, 10)'}

Complex form:

strategy={
    'name': {'type': 'choice', 'values': ['Alice', 'Bob', 'Charlie']},
    'age': {'type': 'normal', 'mean': 35, 'std': 10}
}

Generation types: increment, pattern, choice, normal, uniform, lognormal, exponential, poisson, categorical

Lineage Tracking

Track data transformations across operations to understand data provenance and transformation history.

Enable Lineage Tracking

import additory as add
import pandas as pd

# Enable lineage in any operation
result = add.to(customers, bring_from=orders, bring=['amount'], 
                against='customer_id', lineage=True)

# Lineage is preserved across operations
result = add.transform('@calc', result, expression='amount * 1.1', 
                       name='total', lineage=True)

# View lineage report
lineage_report = add.scan('@lineage', result)
print(lineage_report)

Lineage Features

Operation History - Track all transformations applied to data
Column Sources - See where each column came from
Row Mappings - Track how rows were filtered or aggregated
Session-Only - Lineage is stored in-memory (not persisted to disk)
Mutual Exclusion - Cannot use lineage=True with as_type parameter

Lineage Example

# Multi-step workflow with lineage
customers = pd.DataFrame({'id': [1, 2, 3], 'name': ['Alice', 'Bob', 'Carol']})
orders = pd.DataFrame({'id': [1, 1, 2, 3, 3], 'amount': [100, 150, 200, 175, 125]})

# Step 1: Bring data
df = add.to(customers, bring_from=orders, bring=['amount'], against='id',
            strategy={'amount': 'sum'}, lineage=True)

# Step 2: Calculate
df = add.transform('@calc', df, expression='amount * 1.1', name='total', lineage=True)

# Step 3: Filter
df = add.transform('@filter', df, where='total > 200', lineage=True)

# View complete lineage
report = add.scan('@lineage', df)
# Shows: 3 operations, column sources, row transformations

Important Notes

Lineage is session-only by design (follows "no file I/O" philosophy)
Lineage metadata is lost when DataFrames are saved with native methods
Cannot use lineage=True with as_type parameter (metadata would be lost during conversion)
Lineage overhead is minimal (<3ms per operation)

Documentation

📚 Complete documentation is available in the /docs directory:

API Reference - Complete function signatures and API documentation
- Quick Reference - Fast lookup guide
- Reference Manual - Comprehensive API docs
- Function Signatures - All signatures with lineage support
User Guides - Step-by-step tutorials and concepts
- Migration Guide - Upgrading from older versions
- Lineage User Story - Understanding lineage tracking
- Deduce Explained - Missing value imputation guide
Examples - 20+ Quarto notebooks with runnable examples
- add.to() examples (5 notebooks)
- add.transform() examples (5 notebooks)
- add.synthetic() examples (4 notebooks)
- add.scan() examples (3 notebooks)
- Lineage tracking examples (2 notebooks)
- Troubleshooting Guide

See docs/README.md for the complete documentation index.

Examples

Lookup Example

import additory as add
import polars as pl

# Orders with customer IDs
orders = pl.DataFrame({
    'order_id': [1, 2, 3],
    'customer_id': [101, 102, 101],
    'amount': [100, 200, 150]
})

# Customer reference data
customers = pl.DataFrame({
    'customer_id': [101, 102],
    'name': ['Alice', 'Bob'],
    'city': ['NYC', 'LA']
})

# Add customer info to orders
result = add.to(orders, bring_from=customers, bring=['name', 'city'], against='customer_id')

Transform Example

# Calculate with expressions
df = pl.DataFrame({'price': [100, 200, 300], 'quantity': [2, 3, 1]})
result = add.transform('@calc', df, strategy={'total': 'price * quantity'})

# Filter data
result = add.transform('@filter', df, where='price > 150')

# Sort data
result = add.transform('@sort', df, by='price', strategy={'order': 'desc'})

# Aggregate data
df = pl.DataFrame({'category': ['A', 'B', 'A'], 'value': [10, 20, 30]})
result = add.transform('@aggregate', df, by='category', strategy={'value': 'sum'})

# Round numbers (creates NEW columns)
df = pl.DataFrame({'price': [10.567, 20.123, 30.999]})
result = add.transform('@round:2', df, columns='price')  # Creates price_round

# Fill missing values
df = pl.DataFrame({'age': [25, None, 35, None, 45]})
result = add.transform('@deduce', df, columns='age', method='mean')

Synthetic Data Example

# Create synthetic data
result = add.synthetic('@new', n=1000, strategy={
    'age': 'normal(40, 10)',
    'salary': 'normal(75000, 15000)',
    'score': 'uniform(0, 100)'
})

# Augment existing data
df = pl.DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]})
result = add.synthetic('@augment', df, n=100)

# Analyze data quality
result = add.synthetic('@analyze', df)

Version

Current version: 0.1.3 (Stable Alpha)

What's New in v0.1.3

✅ Lineage Tracking - Track data transformations with lineage=True parameter
✅ add.scan() Function - Unified interface for @analyze and @lineage modes
✅ ~95% Rust Implementation - Optimized code distribution for performance
✅ Mutual Exclusion Validation - Clear error messages for lineage + as_type
✅ Helper Functions - Internal utilities for lineage tracking
✅ Bug Fixes - Fixed add.to() parameter mapping bug
✅ Code Cleanup - Removed orphan files and dead code
✅ 341/341 Tests Passing - 100% test coverage

Development

Building from Source

# Clone the repository
git clone https://github.com/YOUR_USERNAME/additory.git
cd additory

# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Build the package
cd rust-core
export PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1
maturin build --release

# Install locally
pip install target/wheels/*.whl

Running Tests

# Run comprehensive test suite
python test_all_modes_comprehensive.py

# Run specific tests
pytest tests/

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE file for details

Changelog

v0.1.3 (March 9, 2026)

Lineage Tracking - Track data transformations across operations
add.scan() Function - Unified scanning interface (@analyze, @lineage)
~95% Rust Implementation - Optimized Python/Rust code distribution
Bug Fixes - Fixed add.to() parameter mapping, cleaned up code
341/341 Tests Passing - Complete test coverage

v0.1.3a9 (March 4, 2026)

Updated API signatures for natural language (bring_to, bring_from, bring)
Lists everywhere instead of tuples
@round creates NEW columns (philosophy compliant)
@deduce mode for missing value imputation
@extract merged with datetime parsing
Removed add.set() and add.deduce() functions
Default seed=42 for reproducibility
100% philosophy compliance

v0.1.3a3 (February 9, 2026)

Made pandas optional
Added cross-platform build scripts
Fixed pandas import issues
100% test pass rate

v0.1.3a2 (February 9, 2026)

Added banker's rounding (@bankers_round mode)
Expanded expression library to 179 expressions
Fixed mode detection issues
Fixed power operator (**) support

v0.1.3a1 (February 2026)

Initial alpha release
Rust core with PyO3 bindings
Three-function API (to, transform, synthetic)

Support

For issues, questions, or contributions, please visit:

GitHub Issues: [Coming Soon]
Documentation: [Coming Soon]

Credits

Built with:

Made with ❤️ for the data science community

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.3a11 pre-release

May 9, 2026

This version

0.1.3a10 pre-release

Mar 12, 2026

0.1.3a9 pre-release

Mar 8, 2026

0.1.3a8 pre-release

Mar 3, 2026

0.1.3a7 pre-release

Feb 13, 2026

0.1.3a6 pre-release

Feb 13, 2026

0.1.3a5 pre-release

Feb 13, 2026

0.1.3a4 pre-release

Feb 11, 2026

0.1.3a3 pre-release

Feb 9, 2026

0.1.3a2 pre-release

Feb 9, 2026

0.1.3a1 pre-release

Feb 9, 2026

0.1.2a1 pre-release

Feb 5, 2026

0.1.1a6 pre-release

Feb 4, 2026

0.1.1a5 pre-release

Feb 4, 2026

0.1.1a4 pre-release

Feb 4, 2026

0.1.1a3 pre-release

Feb 4, 2026

0.1.1a2 pre-release

Feb 4, 2026

0.1.1a1 pre-release

Feb 4, 2026

0.1.0a4 pre-release

Jan 28, 2026

0.1.0a3 pre-release

Jan 27, 2026

0.1.0a2 pre-release

Jan 25, 2026

0.1.0a1 pre-release

Jan 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

additory-0.1.3a10-cp313-cp313-manylinux_2_39_x86_64.whl (11.9 MB view details)

Uploaded Mar 12, 2026 CPython 3.13manylinux: glibc 2.39+ x86-64

File details

Details for the file additory-0.1.3a10-cp313-cp313-manylinux_2_39_x86_64.whl.

File metadata

Download URL: additory-0.1.3a10-cp313-cp313-manylinux_2_39_x86_64.whl
Upload date: Mar 12, 2026
Size: 11.9 MB
Tags: CPython 3.13, manylinux: glibc 2.39+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for additory-0.1.3a10-cp313-cp313-manylinux_2_39_x86_64.whl
Algorithm	Hash digest
SHA256	`b3e75f89641c992863ffff94dd00cef8ea98a485192d8e73b4fc20355c2ca5ca`
MD5	`d1c9764674b6b75f122698ed688649f6`
BLAKE2b-256	`e3ea60e8990c106e33b086dba2858972f8ddfd9c10a21d2b2f29f8b02a5bb2d7`

See more details on using hashes here.

additory 0.1.3a10

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

additory

Features

Installation

Quick Start

Core Functions

add.to() - Add Data from External Sources

add.transform() - Transform Data

add.synthetic() - Synthetic Data

add.scan() - Inspect and Analyze DataFrames

Strategy Parameter

add.to() Strategy

add.transform() Strategy

add.synthetic() Strategy

Lineage Tracking

Enable Lineage Tracking

Lineage Features

Lineage Example

Important Notes

Documentation

Examples

Lookup Example

Transform Example

Synthetic Data Example

Version

What's New in v0.1.3

Development

Building from Source

Running Tests

Contributing

License

Changelog

v0.1.3 (March 9, 2026)

v0.1.3a9 (March 4, 2026)

v0.1.3a3 (February 9, 2026)

v0.1.3a2 (February 9, 2026)

v0.1.3a1 (February 2026)

Support

Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes