A semantic, extensible dataframe transformation engine with expressions, lookup, and synthetic data generation support.

These details have not been verified by PyPI

Project links

Project description

Additory

A semantic, extensible dataframe transformation engine with expressions, lookup, and augmentation support.

Author: Krishnamoorthy Sankaran

🛠️ Requirements

Python: 3.9+
Core dependencies: pandas, polars, numpy, scipy
Optional: cuDF (for GPU support)

📦 Installation

pip install additory==0.1.0a4

Optional GPU support:

pip install additory[gpu]==0.1.0a4  # Includes cuDF for GPU acceleration

Development installation:

pip install additory[dev]==0.1.0a4  # Includes testing and development tools

🎯 Core Functions

Function	Purpose	Example
`add.to()`	Lookup/join operations	`add.to(df1, from_df=df2, bring='col', against='key')`
`add.synthetic()`	Generate additional data	`add.synthetic(df, n_rows=1000)`
`add.deduce()`	Text-based label deduction	`add.deduce(df, from_column='text', to_column='label')`
`add.scan()`	Data profiling & analysis	`add.scan(df, preset="full")`

🧬 Available Expressions

Additory includes 12 built-in health and fitness expressions:

add.bmi() - Body Mass Index
add.bsa() - Body Surface Area
add.bmr() - Basal Metabolic Rate
add.waist_hip_ratio() - Waist-to-Hip Ratio
add.body_fat_percentage() - Body Fat Percentage
add.ideal_body_weight() - Ideal Body Weight
add.blood_pressure_category() - BP Classification
add.cholesterol_ratio() - Cholesterol Ratio
add.age_category() - Age Classification
add.fitness_score() - Overall Fitness Score

# Health calculations
patients = pd.DataFrame({
    'weight_kg': [70, 80, 65],  # Weight in kilograms
    'height_m': [1.75, 1.80, 1.60],  # Height in meters
    'age': [25, 35, 45],
    'gender': ['M', 'F', 'M']
})

patients_bmi = add.bmi(patients)
patients_bsa = add.bsa(patients)
fitness_scores = add.fitness_score(patients)

# Chain multiple expressions
result = add.fitness_score(add.bmr(add.bmi(patients)))

🔧 DataFrame Support

Additory works seamlessly with multiple DataFrame libraries:

pandas - Full support
polars - Full support
cuDF - GPU acceleration support

import polars as pl
import additory as add

# Works with polars
df_polars = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
result = add.synthetic(df_polars, n_rows=100)

# Automatic type detection and conversion

✨ Key Features

🔧 Utilities

add.to() - Data Lookup & Joins Simplified syntax for bringing columns from one dataframe to another.

# Simple lookup
orders_with_prices = add.to(
    orders, 
    from_df=products, 
    bring='price', 
    against='product_id'
)

# Multiple columns and keys
enriched = add.to(
    orders,
    from_df=products,
    bring=['price', 'category'],
    against=['product_id', 'region']
)

add.onehotencoding() - Categorical Encoding Convert categorical columns to one-hot encoded format.

# One-hot encoding (single column)
encoded = add.onehotencoding(df, 'category')

add.harmonize_units() - Unit Standardization Standardize units across your dataset.

# Unit harmonization
standardized = add.harmonize_units(
    df, 
    value_column='temperature', 
    unit_column='unit',
    target_unit='C'
)

🧮 Expressions

Pre-built calculations for health, fitness, and common metrics. Simple examples:

# Create patient data with correct column names
patients = pd.DataFrame({
    'weight_kg': [70, 80, 65],  # Weight in kilograms
    'height_m': [1.75, 1.80, 1.60],  # Height in meters
    'age': [25, 35, 45],
    'gender': ['M', 'F', 'M']
})

# Calculate BMI
patients_with_bmi = add.bmi(patients)

# Calculate Body Surface Area
patients_with_bsa = add.bsa(patients)

# Chain multiple expressions
result = add.fitness_score(add.bmr(add.bmi(patients)))

🔄 Synthetic Data Generation

Synthetic generates additional data similar to your existing dataset using inline strategies.

# Extend existing data (learns from patterns)
more_customers = add.synthetic(customers, n_rows=1000)

# Create data from scratch with strategies
new_data = add.synthetic("@new", n_rows=500, strategy={
    'id': 'increment:start=1',
    'name': 'choice:[John,Jane,Bob]',
    'age': 'range:18-65'
})

🤖 Text-Based Label Deduction

Deduce automatically fills in missing labels by learning from your existing labeled examples. Pure Python, no LLMs, offline-first.

# Deduce missing labels from text
tickets = pd.DataFrame({
    "ticket_text": ["Cannot log in", "Billing question", "App crashes", "Need invoice"],
    "category": ["Technical", "Billing", None, None]
})

# Automatically fill in missing categories
result = add.deduce(tickets, from_column="ticket_text", to_column="category")

# Use multiple columns for better accuracy
result = add.deduce(
    df,
    from_column=["title", "description"],
    to_column="category"
)

🧪 Examples

E-commerce Data Pipeline

import pandas as pd
import additory as add

# Start with small customer sample
customers = pd.DataFrame({
    'customer_id': [1, 2, 3],
    'age': [25, 35, 45],
    'region': ['North', 'South', 'East']
})

# Generate more customers
customers = add.synthetic(customers, n_rows=10000)

# Add customer tiers
tiers = pd.DataFrame({
    'customer_id': range(1, 4),  # Match original IDs
    'tier': ['Gold', 'Silver', 'Bronze']
})

# Use pipeline approach
result = (customers
    .pipe(add.to, from_df=tiers, bring='tier', against='customer_id')
    .pipe(add.scan, preset="quick"))

print(result.summary())

Healthcare Data Analysis

# Create patient data from scratch
strategy = {
    'patient_id': 'increment:start=1',
    'age': 'range:18-80',
    'weight_kg': 'range:50-120',  # Weight in kg
    'height_cm': 'range:150-200'  # Height in cm
}

patients = add.synthetic("@new", n_rows=1000, strategy=strategy)

# Convert height to meters for expressions
patients['height_m'] = patients['height_cm'] / 100

# Calculate health metrics using pipeline
result = (patients
    .pipe(add.bmi)
    .pipe(add.scan, preset="correlations"))

print(result.correlations)

📚 Documentation

Function Documentation - Detailed guides for each function
Expressions Guide - Complete expressions reference

📄 License

MIT License - see LICENSE file for details.

📞 Support

Issues: GitHub Issues
Documentation: Full Documentation

🗺️ v0.1.1 (January 2026)

Enhanced documentation and tutorials
Performance optimizations
Additional expressions
Advanced synthetic data patterns

Made with ❤️ for data scientists, analysts, and developers who love working with data.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.3a11 pre-release

May 9, 2026

0.1.3a10 pre-release

Mar 12, 2026

0.1.3a9 pre-release

Mar 8, 2026

0.1.3a8 pre-release

Mar 3, 2026

0.1.3a7 pre-release

Feb 13, 2026

0.1.3a6 pre-release

Feb 13, 2026

0.1.3a5 pre-release

Feb 13, 2026

0.1.3a4 pre-release

Feb 11, 2026

0.1.3a3 pre-release

Feb 9, 2026

0.1.3a2 pre-release

Feb 9, 2026

0.1.3a1 pre-release

Feb 9, 2026

0.1.2a1 pre-release

Feb 5, 2026

0.1.1a6 pre-release

Feb 4, 2026

0.1.1a5 pre-release

Feb 4, 2026

0.1.1a4 pre-release

Feb 4, 2026

0.1.1a3 pre-release

Feb 4, 2026

0.1.1a2 pre-release

Feb 4, 2026

0.1.1a1 pre-release

Feb 4, 2026

This version

0.1.0a4 pre-release

Jan 28, 2026

0.1.0a3 pre-release

Jan 27, 2026

0.1.0a2 pre-release

Jan 25, 2026

0.1.0a1 pre-release

Jan 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

additory-0.1.0a4.tar.gz (177.1 kB view details)

Uploaded Jan 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

additory-0.1.0a4-py3-none-any.whl (178.7 kB view details)

Uploaded Jan 28, 2026 Python 3

File details

Details for the file additory-0.1.0a4.tar.gz.

File metadata

Download URL: additory-0.1.0a4.tar.gz
Upload date: Jan 28, 2026
Size: 177.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for additory-0.1.0a4.tar.gz
Algorithm	Hash digest
SHA256	`7026c9088d11fcab281ecae1ed9b1987f1bcd91cdf85810868bb44ff7a3d04ed`
MD5	`98b9f70f8fe62dff9359fe2a3945a815`
BLAKE2b-256	`9ed202bbe06c96f74a67d0eda91e59e22e947cfa37f5ef5735b05528d5e47b41`

See more details on using hashes here.

File details

Details for the file additory-0.1.0a4-py3-none-any.whl.

File metadata

Download URL: additory-0.1.0a4-py3-none-any.whl
Upload date: Jan 28, 2026
Size: 178.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for additory-0.1.0a4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`46aa2896930d07a816a610598b686f289cfb2f2fd71c72b3ea943a5b5766d07a`
MD5	`a22d5201cb5ca2ad27955e25b81a05c0`
BLAKE2b-256	`0fdb0c28b8b1b59891cacdf482093758b77d965f2d96615344ae55f413aa3ba1`

See more details on using hashes here.

additory 0.1.0a4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Additory

🛠️ Requirements

📦 Installation

🎯 Core Functions

🧬 Available Expressions

🔧 DataFrame Support

✨ Key Features

🔧 Utilities

🧮 Expressions

🔄 Synthetic Data Generation

🤖 Text-Based Label Deduction

🧪 Examples

E-commerce Data Pipeline

Healthcare Data Analysis

📚 Documentation

📄 License

📞 Support

🗺️ v0.1.1 (January 2026)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes