Skip to main content

A semantic, extensible dataframe transformation engine with expressions, lookup, and synthetic data generation support.

Project description

Additory

A semantic, extensible dataframe transformation engine with expressions, lookup, and augmentation support.

Python 3.9+ License: MIT Version

Author: Krishnamoorthy Sankaran

🛠️ Requirements

  • Python: 3.9+
  • Core dependencies: pandas, polars, numpy, scipy
  • Optional: cuDF (for GPU support)

📦 Installation

pip install additory==0.1.0a4

Optional GPU support:

pip install additory[gpu]==0.1.0a4  # Includes cuDF for GPU acceleration

Development installation:

pip install additory[dev]==0.1.0a4  # Includes testing and development tools

🎯 Core Functions

Function Purpose Example
add.to() Lookup/join operations add.to(df1, from_df=df2, bring='col', against='key')
add.synthetic() Generate additional data add.synthetic(df, n_rows=1000)
add.deduce() Text-based label deduction add.deduce(df, from_column='text', to_column='label')
add.scan() Data profiling & analysis add.scan(df, preset="full")

🧬 Available Expressions

Additory includes 12 built-in health and fitness expressions:

  • add.bmi() - Body Mass Index
  • add.bsa() - Body Surface Area
  • add.bmr() - Basal Metabolic Rate
  • add.waist_hip_ratio() - Waist-to-Hip Ratio
  • add.body_fat_percentage() - Body Fat Percentage
  • add.ideal_body_weight() - Ideal Body Weight
  • add.blood_pressure_category() - BP Classification
  • add.cholesterol_ratio() - Cholesterol Ratio
  • add.age_category() - Age Classification
  • add.fitness_score() - Overall Fitness Score
# Health calculations
patients = pd.DataFrame({
    'weight_kg': [70, 80, 65],  # Weight in kilograms
    'height_m': [1.75, 1.80, 1.60],  # Height in meters
    'age': [25, 35, 45],
    'gender': ['M', 'F', 'M']
})

patients_bmi = add.bmi(patients)
patients_bsa = add.bsa(patients)
fitness_scores = add.fitness_score(patients)

# Chain multiple expressions
result = add.fitness_score(add.bmr(add.bmi(patients)))

🔧 DataFrame Support

Additory works seamlessly with multiple DataFrame libraries:

  • pandas - Full support
  • polars - Full support
  • cuDF - GPU acceleration support
import polars as pl
import additory as add

# Works with polars
df_polars = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
result = add.synthetic(df_polars, n_rows=100)

# Automatic type detection and conversion

✨ Key Features

🔧 Utilities

add.to() - Data Lookup & Joins Simplified syntax for bringing columns from one dataframe to another.

# Simple lookup
orders_with_prices = add.to(
    orders, 
    from_df=products, 
    bring='price', 
    against='product_id'
)

# Multiple columns and keys
enriched = add.to(
    orders,
    from_df=products,
    bring=['price', 'category'],
    against=['product_id', 'region']
)

add.onehotencoding() - Categorical Encoding Convert categorical columns to one-hot encoded format.

# One-hot encoding (single column)
encoded = add.onehotencoding(df, 'category')

add.harmonize_units() - Unit Standardization Standardize units across your dataset.

# Unit harmonization
standardized = add.harmonize_units(
    df, 
    value_column='temperature', 
    unit_column='unit',
    target_unit='C'
)

🧮 Expressions

Pre-built calculations for health, fitness, and common metrics. Simple examples:

# Create patient data with correct column names
patients = pd.DataFrame({
    'weight_kg': [70, 80, 65],  # Weight in kilograms
    'height_m': [1.75, 1.80, 1.60],  # Height in meters
    'age': [25, 35, 45],
    'gender': ['M', 'F', 'M']
})

# Calculate BMI
patients_with_bmi = add.bmi(patients)

# Calculate Body Surface Area
patients_with_bsa = add.bsa(patients)

# Chain multiple expressions
result = add.fitness_score(add.bmr(add.bmi(patients)))

🔄 Synthetic Data Generation

Synthetic generates additional data similar to your existing dataset using inline strategies.

# Extend existing data (learns from patterns)
more_customers = add.synthetic(customers, n_rows=1000)

# Create data from scratch with strategies
new_data = add.synthetic("@new", n_rows=500, strategy={
    'id': 'increment:start=1',
    'name': 'choice:[John,Jane,Bob]',
    'age': 'range:18-65'
})

🤖 Text-Based Label Deduction

Deduce automatically fills in missing labels by learning from your existing labeled examples. Pure Python, no LLMs, offline-first.

# Deduce missing labels from text
tickets = pd.DataFrame({
    "ticket_text": ["Cannot log in", "Billing question", "App crashes", "Need invoice"],
    "category": ["Technical", "Billing", None, None]
})

# Automatically fill in missing categories
result = add.deduce(tickets, from_column="ticket_text", to_column="category")

# Use multiple columns for better accuracy
result = add.deduce(
    df,
    from_column=["title", "description"],
    to_column="category"
)

🧪 Examples

E-commerce Data Pipeline

import pandas as pd
import additory as add

# Start with small customer sample
customers = pd.DataFrame({
    'customer_id': [1, 2, 3],
    'age': [25, 35, 45],
    'region': ['North', 'South', 'East']
})

# Generate more customers
customers = add.synthetic(customers, n_rows=10000)

# Add customer tiers
tiers = pd.DataFrame({
    'customer_id': range(1, 4),  # Match original IDs
    'tier': ['Gold', 'Silver', 'Bronze']
})

# Use pipeline approach
result = (customers
    .pipe(add.to, from_df=tiers, bring='tier', against='customer_id')
    .pipe(add.scan, preset="quick"))

print(result.summary())

Healthcare Data Analysis

# Create patient data from scratch
strategy = {
    'patient_id': 'increment:start=1',
    'age': 'range:18-80',
    'weight_kg': 'range:50-120',  # Weight in kg
    'height_cm': 'range:150-200'  # Height in cm
}

patients = add.synthetic("@new", n_rows=1000, strategy=strategy)

# Convert height to meters for expressions
patients['height_m'] = patients['height_cm'] / 100

# Calculate health metrics using pipeline
result = (patients
    .pipe(add.bmi)
    .pipe(add.scan, preset="correlations"))

print(result.correlations)

📚 Documentation

📄 License

MIT License - see LICENSE file for details.

📞 Support

🗺️ v0.1.1 (January 2026)

  • Enhanced documentation and tutorials
  • Performance optimizations
  • Additional expressions
  • Advanced synthetic data patterns

Made with ❤️ for data scientists, analysts, and developers who love working with data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

additory-0.1.0a4.tar.gz (177.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

additory-0.1.0a4-py3-none-any.whl (178.7 kB view details)

Uploaded Python 3

File details

Details for the file additory-0.1.0a4.tar.gz.

File metadata

  • Download URL: additory-0.1.0a4.tar.gz
  • Upload date:
  • Size: 177.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for additory-0.1.0a4.tar.gz
Algorithm Hash digest
SHA256 7026c9088d11fcab281ecae1ed9b1987f1bcd91cdf85810868bb44ff7a3d04ed
MD5 98b9f70f8fe62dff9359fe2a3945a815
BLAKE2b-256 9ed202bbe06c96f74a67d0eda91e59e22e947cfa37f5ef5735b05528d5e47b41

See more details on using hashes here.

File details

Details for the file additory-0.1.0a4-py3-none-any.whl.

File metadata

  • Download URL: additory-0.1.0a4-py3-none-any.whl
  • Upload date:
  • Size: 178.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for additory-0.1.0a4-py3-none-any.whl
Algorithm Hash digest
SHA256 46aa2896930d07a816a610598b686f289cfb2f2fd71c72b3ea943a5b5766d07a
MD5 a22d5201cb5ca2ad27955e25b81a05c0
BLAKE2b-256 0fdb0c28b8b1b59891cacdf482093758b77d965f2d96615344ae55f413aa3ba1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page