A semantic, extensible dataframe transformation engine with expressions, lookup, and synthetic data generation support.
Project description
Additory
A semantic, extensible dataframe transformation engine with expressions, lookup, and augmentation support.
Author: Krishnamoorthy Sankaran
🛠️ Requirements
- Python: 3.9+
- Core dependencies: pandas, polars, numpy, scipy
- Optional: cuDF (for GPU support)
📦 Installation
pip install additory==0.1.0a2
Optional GPU support:
pip install additory[gpu]==0.1.0a2 # Includes cuDF for GPU acceleration
Development installation:
pip install additory[dev]==0.1.0a2 # Includes testing and development tools
🎯 Core Functions
| Function | Purpose | Example |
|---|---|---|
add.to() |
Lookup/join operations | add.to(df1, from_df=df2, bring='col', against='key') |
add.augment() |
Generate additional data | add.augment(df, n_rows=1000) |
add.scan() |
Data profiling & analysis | add.scan(df, preset="full") |
🧬 Available Expressions
Additory includes 12 built-in health and fitness expressions:
add.bmi()- Body Mass Indexadd.bsa()- Body Surface Areaadd.bmr()- Basal Metabolic Rateadd.waist_hip_ratio()- Waist-to-Hip Ratioadd.body_fat_percentage()- Body Fat Percentageadd.ideal_body_weight()- Ideal Body Weightadd.blood_pressure_category()- BP Classificationadd.cholesterol_ratio()- Cholesterol Ratioadd.age_category()- Age Classificationadd.fitness_score()- Overall Fitness Score
# Health calculations
patients = pd.DataFrame({
'weight_kg': [70, 80, 65], # Weight in kilograms
'height_m': [1.75, 1.80, 1.60], # Height in meters
'age': [25, 35, 45],
'gender': ['M', 'F', 'M']
})
patients_bmi = add.bmi(patients)
patients_bsa = add.bsa(patients)
fitness_scores = add.fitness_score(patients)
# Chain multiple expressions
result = add.fitness_score(add.bmr(add.bmi(patients)))
🔧 DataFrame Support
Additory works seamlessly with multiple DataFrame libraries:
- pandas - Full support
- polars - Full support
- cuDF - GPU acceleration support
import polars as pl
import additory as add
# Works with polars
df_polars = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
result = add.augment(df_polars, n_rows=100)
# Automatic type detection and conversion
✨ Key Features
🔧 Utilities
add.to() - Data Lookup & Joins Simplified syntax for bringing columns from one dataframe to another.
# Simple lookup
orders_with_prices = add.to(
orders,
from_df=products,
bring='price',
against='product_id'
)
# Multiple columns and keys
enriched = add.to(
orders,
from_df=products,
bring=['price', 'category'],
against=['product_id', 'region']
)
add.onehotencoding() - Categorical Encoding Convert categorical columns to one-hot encoded format.
# One-hot encoding (single column)
encoded = add.onehotencoding(df, 'category')
add.harmonize_units() - Unit Standardization Standardize units across your dataset.
# Unit harmonization
standardized = add.harmonize_units(
df,
value_column='temperature',
unit_column='unit',
target_unit='C'
)
🧮 Expressions
Pre-built calculations for health, fitness, and common metrics. Simple examples:
# Create patient data with correct column names
patients = pd.DataFrame({
'weight_kg': [70, 80, 65], # Weight in kilograms
'height_m': [1.75, 1.80, 1.60], # Height in meters
'age': [25, 35, 45],
'gender': ['M', 'F', 'M']
})
# Calculate BMI
patients_with_bmi = add.bmi(patients)
# Calculate Body Surface Area
patients_with_bsa = add.bsa(patients)
# Chain multiple expressions
result = add.fitness_score(add.bmr(add.bmi(patients)))
🔄 Augment Data Generation
Augment generates additional data similar to your existing dataset using inline strategies.
# Augment existing data (learns from patterns)
more_customers = add.augment(customers, n_rows=1000)
# Create data from scratch with strategies
new_data = add.augment("@new", n_rows=500, strategy={
'id': 'increment:start=1',
'name': 'choice:[John,Jane,Bob]',
'age': 'range:18-65'
})
🧪 Examples
E-commerce Data Pipeline
import pandas as pd
import additory as add
# Start with small customer sample
customers = pd.DataFrame({
'customer_id': [1, 2, 3],
'age': [25, 35, 45],
'region': ['North', 'South', 'East']
})
# Generate more customers
customers = add.augment(customers, n_rows=10000)
# Add customer tiers
tiers = pd.DataFrame({
'customer_id': range(1, 4), # Match original IDs
'tier': ['Gold', 'Silver', 'Bronze']
})
# Use pipeline approach
result = (customers
.pipe(add.to, from_df=tiers, bring='tier', against='customer_id')
.pipe(add.scan, preset="quick"))
print(result.summary())
Healthcare Data Analysis
# Create patient data from scratch
strategy = {
'patient_id': 'increment:start=1',
'age': 'range:18-80',
'weight_kg': 'range:50-120', # Weight in kg
'height_cm': 'range:150-200' # Height in cm
}
patients = add.augment("@new", n_rows=1000, strategy=strategy)
# Convert height to meters for expressions
patients['height_m'] = patients['height_cm'] / 100
# Calculate health metrics using pipeline
result = (patients
.pipe(add.bmi)
.pipe(add.scan, preset="correlations"))
print(result.correlations)
📚 Documentation
- Function Documentation - Detailed guides for each function
- Expressions Guide - Complete expressions reference
📄 License
MIT License - see LICENSE file for details.
📞 Support
- Issues: GitHub Issues
- Documentation: Full Documentation
🗺️ v0.1.1 (February 2025)
- Enhanced documentation and tutorials
- Performance optimizations
- Additional expressions
- Advanced synthetic data patterns
Made with ❤️ for data scientists, analysts, and developers who love working with data.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file additory-0.1.0a3.tar.gz.
File metadata
- Download URL: additory-0.1.0a3.tar.gz
- Upload date:
- Size: 170.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
85c6822f821e4445f30f2629ddc789579c8861bdd0af120e98b046f60fcab8db
|
|
| MD5 |
2511b4a3e16465c14fc6845f83bb534b
|
|
| BLAKE2b-256 |
8a63aeb504cea4496bc823e998061e7b42413654019262c5a9c6fb032da1913f
|
File details
Details for the file additory-0.1.0a3-py3-none-any.whl.
File metadata
- Download URL: additory-0.1.0a3-py3-none-any.whl
- Upload date:
- Size: 174.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
105c66ed86fcb3b275ad66b9819ad9dab19ff55925f939b2461e73bc1e08058b
|
|
| MD5 |
4f18def42a10b0cf46e3564d4e9c6628
|
|
| BLAKE2b-256 |
878619ad2d388036029c9d9bc45a853851a9b7c1649a4399a20c140956987efa
|