A semantic, extensible dataframe transformation engine with expressions, lookup, synthetic data, and sample-data support.

These details have not been verified by PyPI

Project links

Project description

Additory

A semantic, extensible dataframe transformation engine with expressions, lookup, synthetic data, and sample-data support.

Author: Krishnamoorthy Sankaran

🛠️ Requirements

Python: 3.9+
Core dependencies: pandas, polars, numpy, scipy
Optional: cuDF (for GPU support)

📦 Installation

pip install additory==0.1.0a1

Optional GPU support:

pip install additory[gpu]==0.1.0a1  # Includes cuDF for GPU acceleration

Development installation:

pip install additory[dev]==0.1.0a1  # Includes testing and development tools

🎯 Core Functions

Function	Purpose	Example
`add.to()`	Lookup/join operations	`add.to(df1, from_df=df2, bring='col', against='key')`
`add.augment()`	Generate additional data	`add.augment(df, n_rows=1000)`
`add.synth()`	Synthetic data from schemas	`add.synth("schema.toml", rows=5000)`
`add.scan()`	Data profiling & analysis	`add.scan(df, preset="full")`

🧬 Available Expressions

Additory includes 12 built-in health and fitness expressions:

add.bmi() - Body Mass Index
add.bsa() - Body Surface Area
add.bmr() - Basal Metabolic Rate
add.waist_hip_ratio() - Waist-to-Hip Ratio
add.body_fat_percentage() - Body Fat Percentage
add.ideal_body_weight() - Ideal Body Weight
add.blood_pressure_category() - BP Classification
add.cholesterol_ratio() - Cholesterol Ratio
add.age_category() - Age Classification
add.fitness_score() - Overall Fitness Score

# Health calculations
patients = pd.DataFrame({
    'weight_kg': [70, 80, 65],  # Weight in kilograms
    'height_m': [1.75, 1.80, 1.60],  # Height in meters
    'age': [25, 35, 45],
    'gender': ['M', 'F', 'M']
})

patients_bmi = add.bmi(patients)
patients_bsa = add.bsa(patients)
fitness_scores = add.fitness_score(patients)

# Chain multiple expressions
result = add.fitness_score(add.bmr(add.bmi(patients)))

🔧 DataFrame Support

Additory works seamlessly with multiple DataFrame libraries:

pandas - Full support
polars - Full support
cuDF - GPU acceleration support

import polars as pl
import additory as add

# Works with polars
df_polars = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
result = add.augment(df_polars, n_rows=100)

# Automatic type detection and conversion

✨ Key Features

🔧 Utilities

add.to() - Data Lookup & Joins Simplified syntax for bringing columns from one dataframe to another.

# Simple lookup
orders_with_prices = add.to(
    orders, 
    from_df=products, 
    bring='price', 
    against='product_id'
)

# Multiple columns and keys
enriched = add.to(
    orders,
    from_df=products,
    bring=['price', 'category'],
    against=['product_id', 'region']
)

add.onehotencoding() - Categorical Encoding Convert categorical columns to one-hot encoded format.

# One-hot encoding (single column)
encoded = add.onehotencoding(df, 'category')

add.harmonize_units() - Unit Standardization Standardize units across your dataset.

# Unit harmonization
standardized = add.harmonize_units(
    df, 
    value_column='temperature', 
    unit_column='unit',
    target_unit='C'
)

🧮 Expressions

Pre-built calculations for health, fitness, and common metrics. Simple examples:

# Create patient data with correct column names
patients = pd.DataFrame({
    'weight_kg': [70, 80, 65],  # Weight in kilograms
    'height_m': [1.75, 1.80, 1.60],  # Height in meters
    'age': [25, 35, 45],
    'gender': ['M', 'F', 'M']
})

# Calculate BMI
patients_with_bmi = add.bmi(patients)

# Calculate Body Surface Area
patients_with_bsa = add.bsa(patients)

# Chain multiple expressions
result = add.fitness_score(add.bmr(add.bmi(patients)))

🔄 Augment and Synthetic Data

Augment generates more data similar to your existing dataset, while Synthetic creates entirely new datasets from schema definitions.

Key Differences:

Augment: Learns patterns from existing data to create similar rows
Synthetic: Uses predefined schemas to generate structured data

# Augment existing data (learns from patterns)
more_customers = add.augment(customers, n_rows=1000)

# Create data from scratch with strategies
new_data = add.augment("@new", n_rows=500, strategy={
    'id': 'increment:start=1',
    'name': 'choice:[John,Jane,Bob]',
    'age': 'range:18-65'
})

# Generate from schema file (structured approach)
customers = add.synth("customer_schema.toml", rows=10000)

🧪 Examples

E-commerce Data Pipeline

import pandas as pd
import additory as add

# Start with small customer sample
customers = pd.DataFrame({
    'customer_id': [1, 2, 3],
    'age': [25, 35, 45],
    'region': ['North', 'South', 'East']
})

# Generate more customers
customers = add.augment(customers, n_rows=10000)

# Add customer tiers
tiers = pd.DataFrame({
    'customer_id': range(1, 4),  # Match original IDs
    'tier': ['Gold', 'Silver', 'Bronze']
})

# Use pipeline approach
result = (customers
    .pipe(add.to, from_df=tiers, bring='tier', against='customer_id')
    .pipe(add.scan, preset="quick"))

print(result.summary())

Healthcare Data Analysis

# Create patient data from scratch
strategy = {
    'patient_id': 'increment:start=1',
    'age': 'range:18-80',
    'weight_kg': 'range:50-120',  # Weight in kg
    'height_cm': 'range:150-200'  # Height in cm
}

patients = add.augment("@new", n_rows=1000, strategy=strategy)

# Convert height to meters for expressions
patients['height_m'] = patients['height_cm'] / 100

# Calculate health metrics using pipeline
result = (patients
    .pipe(add.bmi)
    .pipe(add.scan, preset="correlations"))

print(result.correlations)

📚 Documentation

Function Documentation - Detailed guides for each function
Expressions Guide - Complete expressions reference

📄 License

MIT License - see LICENSE file for details.

📞 Support

Issues: GitHub Issues
Documentation: Full Documentation

🗺️ v0.1.1 (February 2025)

Enhanced documentation and tutorials
Performance optimizations
Additional expressions
Advanced synthetic data patterns

Made with ❤️ for data scientists, analysts, and developers who love working with data.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.3a11 pre-release

May 9, 2026

0.1.3a10 pre-release

Mar 12, 2026

0.1.3a9 pre-release

Mar 8, 2026

0.1.3a8 pre-release

Mar 3, 2026

0.1.3a7 pre-release

Feb 13, 2026

0.1.3a6 pre-release

Feb 13, 2026

0.1.3a5 pre-release

Feb 13, 2026

0.1.3a4 pre-release

Feb 11, 2026

0.1.3a3 pre-release

Feb 9, 2026

0.1.3a2 pre-release

Feb 9, 2026

0.1.3a1 pre-release

Feb 9, 2026

0.1.2a1 pre-release

Feb 5, 2026

0.1.1a6 pre-release

Feb 4, 2026

0.1.1a5 pre-release

Feb 4, 2026

0.1.1a4 pre-release

Feb 4, 2026

0.1.1a3 pre-release

Feb 4, 2026

0.1.1a2 pre-release

Feb 4, 2026

0.1.1a1 pre-release

Feb 4, 2026

0.1.0a4 pre-release

Jan 28, 2026

0.1.0a3 pre-release

Jan 27, 2026

0.1.0a2 pre-release

Jan 25, 2026

This version

0.1.0a1 pre-release

Jan 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

additory-0.1.0a1.tar.gz (231.6 kB view details)

Uploaded Jan 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

additory-0.1.0a1-py3-none-any.whl (232.0 kB view details)

Uploaded Jan 25, 2026 Python 3

File details

Details for the file additory-0.1.0a1.tar.gz.

File metadata

Download URL: additory-0.1.0a1.tar.gz
Upload date: Jan 25, 2026
Size: 231.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for additory-0.1.0a1.tar.gz
Algorithm	Hash digest
SHA256	`0b694142721d2fd61e9c91b678c96c7664ba04507b5503e5e00fda364c980218`
MD5	`96e9828cfc024baa229d150f148bc25a`
BLAKE2b-256	`f0d7e0591d1b5a62af660672d9a528e03d69eb985588221f4b15a6acc7d138b7`

See more details on using hashes here.

File details

Details for the file additory-0.1.0a1-py3-none-any.whl.

File metadata

Download URL: additory-0.1.0a1-py3-none-any.whl
Upload date: Jan 25, 2026
Size: 232.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for additory-0.1.0a1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9f6fc0b6896f2dc6c10ea9ad1203843bf37b0e2d1589ec0bc841a5dd72938bd8`
MD5	`d06a89f11b993b6e98044bd7cfc457d0`
BLAKE2b-256	`d23159d885c9fd47091052a3a7b6566932b82e143f54282a0d0fb84b0bdca264`

See more details on using hashes here.

additory 0.1.0a1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Additory

🛠️ Requirements

📦 Installation

🎯 Core Functions

🧬 Available Expressions

🔧 DataFrame Support

✨ Key Features

🔧 Utilities

🧮 Expressions

🔄 Augment and Synthetic Data

🧪 Examples

E-commerce Data Pipeline

Healthcare Data Analysis

📚 Documentation

📄 License

📞 Support

🗺️ v0.1.1 (February 2025)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes