Skip to main content

A semantic, extensible dataframe transformation engine with expressions, lookup, and synthetic data generation support.

Project description

Additory

A semantic, extensible dataframe transformation engine with expressions, lookup, and augmentation support.

Python 3.9+ License: MIT Version

Author: Krishnamoorthy Sankaran

🛠️ Requirements

  • Python: 3.9+
  • Core dependencies: pandas, polars, numpy, scipy
  • Optional: cuDF (for GPU support)

📦 Installation

pip install additory==0.1.0a2

Optional GPU support:

pip install additory[gpu]==0.1.0a2  # Includes cuDF for GPU acceleration

Development installation:

pip install additory[dev]==0.1.0a2  # Includes testing and development tools

🎯 Core Functions

Function Purpose Example
add.to() Lookup/join operations add.to(df1, from_df=df2, bring='col', against='key')
add.augment() Generate additional data add.augment(df, n_rows=1000)
add.scan() Data profiling & analysis add.scan(df, preset="full")

🧬 Available Expressions

Additory includes 12 built-in health and fitness expressions:

  • add.bmi() - Body Mass Index
  • add.bsa() - Body Surface Area
  • add.bmr() - Basal Metabolic Rate
  • add.waist_hip_ratio() - Waist-to-Hip Ratio
  • add.body_fat_percentage() - Body Fat Percentage
  • add.ideal_body_weight() - Ideal Body Weight
  • add.blood_pressure_category() - BP Classification
  • add.cholesterol_ratio() - Cholesterol Ratio
  • add.age_category() - Age Classification
  • add.fitness_score() - Overall Fitness Score
# Health calculations
patients = pd.DataFrame({
    'weight_kg': [70, 80, 65],  # Weight in kilograms
    'height_m': [1.75, 1.80, 1.60],  # Height in meters
    'age': [25, 35, 45],
    'gender': ['M', 'F', 'M']
})

patients_bmi = add.bmi(patients)
patients_bsa = add.bsa(patients)
fitness_scores = add.fitness_score(patients)

# Chain multiple expressions
result = add.fitness_score(add.bmr(add.bmi(patients)))

🔧 DataFrame Support

Additory works seamlessly with multiple DataFrame libraries:

  • pandas - Full support
  • polars - Full support
  • cuDF - GPU acceleration support
import polars as pl
import additory as add

# Works with polars
df_polars = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
result = add.augment(df_polars, n_rows=100)

# Automatic type detection and conversion

✨ Key Features

🔧 Utilities

add.to() - Data Lookup & Joins Simplified syntax for bringing columns from one dataframe to another.

# Simple lookup
orders_with_prices = add.to(
    orders, 
    from_df=products, 
    bring='price', 
    against='product_id'
)

# Multiple columns and keys
enriched = add.to(
    orders,
    from_df=products,
    bring=['price', 'category'],
    against=['product_id', 'region']
)

add.onehotencoding() - Categorical Encoding Convert categorical columns to one-hot encoded format.

# One-hot encoding (single column)
encoded = add.onehotencoding(df, 'category')

add.harmonize_units() - Unit Standardization Standardize units across your dataset.

# Unit harmonization
standardized = add.harmonize_units(
    df, 
    value_column='temperature', 
    unit_column='unit',
    target_unit='C'
)

🧮 Expressions

Pre-built calculations for health, fitness, and common metrics. Simple examples:

# Create patient data with correct column names
patients = pd.DataFrame({
    'weight_kg': [70, 80, 65],  # Weight in kilograms
    'height_m': [1.75, 1.80, 1.60],  # Height in meters
    'age': [25, 35, 45],
    'gender': ['M', 'F', 'M']
})

# Calculate BMI
patients_with_bmi = add.bmi(patients)

# Calculate Body Surface Area
patients_with_bsa = add.bsa(patients)

# Chain multiple expressions
result = add.fitness_score(add.bmr(add.bmi(patients)))

🔄 Augment Data Generation

Augment generates additional data similar to your existing dataset using inline strategies.

# Augment existing data (learns from patterns)
more_customers = add.augment(customers, n_rows=1000)

# Create data from scratch with strategies
new_data = add.augment("@new", n_rows=500, strategy={
    'id': 'increment:start=1',
    'name': 'choice:[John,Jane,Bob]',
    'age': 'range:18-65'
})

🧪 Examples

E-commerce Data Pipeline

import pandas as pd
import additory as add

# Start with small customer sample
customers = pd.DataFrame({
    'customer_id': [1, 2, 3],
    'age': [25, 35, 45],
    'region': ['North', 'South', 'East']
})

# Generate more customers
customers = add.augment(customers, n_rows=10000)

# Add customer tiers
tiers = pd.DataFrame({
    'customer_id': range(1, 4),  # Match original IDs
    'tier': ['Gold', 'Silver', 'Bronze']
})

# Use pipeline approach
result = (customers
    .pipe(add.to, from_df=tiers, bring='tier', against='customer_id')
    .pipe(add.scan, preset="quick"))

print(result.summary())

Healthcare Data Analysis

# Create patient data from scratch
strategy = {
    'patient_id': 'increment:start=1',
    'age': 'range:18-80',
    'weight_kg': 'range:50-120',  # Weight in kg
    'height_cm': 'range:150-200'  # Height in cm
}

patients = add.augment("@new", n_rows=1000, strategy=strategy)

# Convert height to meters for expressions
patients['height_m'] = patients['height_cm'] / 100

# Calculate health metrics using pipeline
result = (patients
    .pipe(add.bmi)
    .pipe(add.scan, preset="correlations"))

print(result.correlations)

📚 Documentation

📄 License

MIT License - see LICENSE file for details.

📞 Support

🗺️ v0.1.1 (February 2025)

  • Enhanced documentation and tutorials
  • Performance optimizations
  • Additional expressions
  • Advanced synthetic data patterns

Made with ❤️ for data scientists, analysts, and developers who love working with data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

additory-0.1.0a3.tar.gz (170.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

additory-0.1.0a3-py3-none-any.whl (174.9 kB view details)

Uploaded Python 3

File details

Details for the file additory-0.1.0a3.tar.gz.

File metadata

  • Download URL: additory-0.1.0a3.tar.gz
  • Upload date:
  • Size: 170.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for additory-0.1.0a3.tar.gz
Algorithm Hash digest
SHA256 85c6822f821e4445f30f2629ddc789579c8861bdd0af120e98b046f60fcab8db
MD5 2511b4a3e16465c14fc6845f83bb534b
BLAKE2b-256 8a63aeb504cea4496bc823e998061e7b42413654019262c5a9c6fb032da1913f

See more details on using hashes here.

File details

Details for the file additory-0.1.0a3-py3-none-any.whl.

File metadata

  • Download URL: additory-0.1.0a3-py3-none-any.whl
  • Upload date:
  • Size: 174.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for additory-0.1.0a3-py3-none-any.whl
Algorithm Hash digest
SHA256 105c66ed86fcb3b275ad66b9819ad9dab19ff55925f939b2461e73bc1e08058b
MD5 4f18def42a10b0cf46e3564d4e9c6628
BLAKE2b-256 878619ad2d388036029c9d9bc45a853851a9b7c1649a4399a20c140956987efa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page