Elegant data operations for DataFrames

Project description

Additory v0.1.3a9

Elegant data operations for DataFrames

Overview

Additory is a data transformation library that provides a unified API for common data operations with support for both Polars and Pandas DataFrames.

Note: This is an alpha release (v0.1.3a9) with new scanning and lineage tracking capabilities.

Key Features

🔄 Flexible - Works seamlessly with both Polars and Pandas
🎯 Type-safe - Strong typing with clear, actionable error messages
🧪 Tested - Comprehensive test coverage
📚 Documented - Complete API documentation and usage examples
🚀 Rust-powered - Rust acceleration for performance
📝 Natural Language - English-like parameter names (bring_to, bring_from, bring)
📋 Lists Everywhere - Use lists for multiple values, not tuples
🔍 Data Scanning - Statistical profiling and lineage tracking with add.scan()
📊 Lineage Tracking - Optional operation tracking for debugging and auditing

Installation

pip install additory==0.1.3a9

Requirements

Python 3.8+
Polars >= 0.19.0
NumPy >= 1.20.0

Optional Dependencies

# For development
pip install additory[dev]

Quick Start

import additory as add
import polars as pl

# Add columns from external sources
orders = pl.DataFrame({'order_id': [1, 2], 'customer_id': [101, 102]})
customers = pl.DataFrame({'customer_id': [101, 102], 'name': ['Alice', 'Bob']})
result = add.to(orders, bring_from=customers, bring=['name'], against='customer_id')

# Transform data
df = pl.DataFrame({'price': [10.567, 20.123, 30.999]})
result = add.transform('@round:2', df, columns='price')  # Creates price_round

# Fill missing values
df = pl.DataFrame({'age': [25, None, 35, None, 45]})
result = add.transform('@deduce', df, columns='age', strategy={'method': 'mean'})

# Generate synthetic data
result = add.synthetic('@new', n=100, strategy={'age': 'normal(40, 10)'}, seed=42)

# Analyze data with statistical profiling
stats = add.scan('@analyze', df)

# Track lineage for debugging
result = add.to(orders, bring_from=customers, bring=['name'], against='customer_id', lineage=True)
result = add.transform('@calc', result, strategy={'total': 'price * quantity'}, lineage=True)
lineage_report = add.scan('@lineage', result)

print(result)

Features (v0.1.3a9)

1. add.to() - Bring Columns from External Sources

Bring columns from one DataFrame to another based on matching keys:

orders = pl.DataFrame({'order_id': [1, 2], 'customer_id': [101, 102]})
customers = pl.DataFrame({'customer_id': [101, 102], 'name': ['Alice', 'Bob']})

# Basic lookup
result = add.to(orders, bring_from=customers, bring=['name'], against='customer_id')

# Multiple columns (use lists!)
result = add.to(orders, bring_from=customers, bring=['name', 'email'], against='customer_id')

# With aggregation
result = add.to(customers, bring_from=orders, bring='amount', against='customer_id',
                strategy={'amount': 'sum'})

# With lineage tracking
result = add.to(orders, bring_from=customers, bring=['name'], against='customer_id', lineage=True)

2. add.transform() - Transform DataFrames

Transform data using 10 modes:

# @calc - Calculate new columns
result = add.transform('@calc', df, strategy={'total': 'price * quantity'})

# @filter - Filter rows
result = add.transform('@filter', df, where='age > 18')

# @sort - Sort rows
result = add.transform('@sort', df, by='date', strategy={'order': 'desc'})

# @aggregate - Group and aggregate
result = add.transform('@aggregate', df, by='category', strategy={'amount': 'sum'})

# @round - Round numbers (creates NEW columns)
result = add.transform('@round:2', df, columns='price')  # Creates price_round

# @deduce - Fill missing values
result = add.transform('@deduce', df, columns='age', strategy={'method': 'mean'})

# @extract - Extract patterns
result = add.transform('@extract', df, columns='date', strategy={'date': 'dd-MM-yyyy'})

# @onehotencode - One-hot encode
result = add.transform('@onehotencode', df, columns='category')

# @harmonize - Harmonize units
result = add.transform('@harmonize:weight', df)  # Creates weight_kg

# @transpose - Transpose DataFrame
result = add.transform('@transpose', df)

# With lineage tracking
result = add.transform('@calc', df, strategy={'total': 'price * quantity'}, lineage=True)

3. add.synthetic() - Generate Synthetic Data

Generate synthetic data using 3 modes:

# @new - Create new DataFrame
result = add.synthetic('@new', n=1000, strategy={
    'age': 'normal(40, 10)',
    'salary': 'normal(75000, 15000)'
}, seed=42)

# @augment - Add synthetic rows
result = add.synthetic('@augment', df, n=100, seed=42)

# @analyze / @analyse - Analyze data (DEPRECATED - use add.scan('@analyze') instead)
result = add.synthetic('@analyze', df)  # Emits deprecation warning

4. add.scan() - Data Scanning and Lineage Tracking (NEW!)

Scan DataFrames for statistical profiling and lineage tracking:

# @analyze - Statistical profiling
stats = add.scan('@analyze', df)
# Returns: count, missing, unique, mean, std, min, max, quartiles for each column

# Focus on specific aspects
outliers = add.scan('@analyze', df, focus='outliers')
correlations = add.scan('@analyze', df, focus='correlations')
distributions = add.scan('@analyze', df, focus='distributions')

# @lineage - Track operation history
result = add.to(orders, bring_from=customers, bring=['name'], against='customer_id', lineage=True)
result = add.transform('@calc', result, strategy={'total': 'price * quantity'}, lineage=True)
lineage_report = add.scan('@lineage', result)
# Shows: operation sequence, row count changes, column sources, data quality warnings

# Focus on specific lineage aspects
null_analysis = add.scan('@lineage', result, focus='nulls')
excluded_rows = add.scan('@lineage', result, focus='excluded')
source_analysis = add.scan('@lineage', result, focus='source:customers')

# Cell-level tracing
cell_trace = add.scan('@lineage', result, trace=[2, 5])  # Trace column 2, row 5
# Shows: complete transformation history for that specific cell

# Filter lineage output
lineage = add.scan('@lineage', result, columns=['total', 'price'])  # Only these columns
lineage = add.scan('@lineage', result, rows='first:10')  # Only first 10 rows

What's New in v0.1.3a9

✅ New Features

add.scan() Function: New fourth core function for data scanning and lineage tracking
- @analyze mode: Statistical profiling with focus modes (outliers, correlations, distributions)
- @lineage mode: Operation history tracking with focus modes (nulls, excluded, source)
- Cell-level tracing: Trace individual cell transformations through the pipeline
- Filtering support: Filter lineage output by columns, rows, or conditions
Lineage Tracking: Optional operation tracking across all core functions
- Add lineage=True to add.to(), add.transform(), add.synthetic()
- Tracks operation sequence, row count changes, column sources
- Identifies data quality issues (nulls, excluded rows)
- Dependency tracking for calculated columns
- Performance optimized: <15% execution overhead, <25% memory overhead
Deprecation: add.synthetic('@analyze') now emits deprecation warning
- Use add.scan('@analyze') instead for statistical profiling

✅ Previous Changes

Natural Language Parameters: bring_to, bring_from, bring (not fetch)
Lists Everywhere: Use lists for multiple values, not tuples
@round Creates NEW Columns: Philosophy compliant (No Deletion principle)
@deduce Mode: Moved from add.deduce() to add.transform('@deduce')
Removed Functions: add.set() and add.deduce() no longer exist
Default Seed: seed=42 for reproducible synthetic data
@extract Merged: @datetime functionality merged into @extract

Strategy Parameter Structure

The strategy parameter provides fine-grained control over operations in all three functions.

add.to() Strategy

Control aggregation, renaming, and positioning for brought columns.

Simple Form (Aggregation Only)

strategy={'col': 'mode'}

Example:

strategy={
    'amount': 'sum',
    'date': 'last'
}

Complex Form (Full Control)

strategy={
    'col': {
        'mode': 'aggregation_mode',
        'rename': 'new_column_name',
        'position': 'position_spec'
    }
}

Example:

strategy={
    'amount': {
        'mode': 'sum',
        'rename': 'total_spent',
        'position': 'after:customer_id'
    },
    'date': {
        'mode': 'last',
        'rename': 'last_order'
    }
}

Aggregation Modes (15)

first - First value
last - Last value
sum - Sum of values
count - Count of values
average - Average of values
min - Minimum value
max - Maximum value
concat - Concatenate values (comma-separated)
concat[sep] - Concatenate with custom separator (e.g., concat[;])
most_common - Most common value
least_common - Least common value
median - Median value
std - Standard deviation
variance - Variance
unique_count - Count of unique values

add.transform() Strategy

Mode-specific configuration options.

@calc Mode

Expressions for calculating new columns:

strategy={'new_column': 'expression'}

Example:

strategy={
    'total': 'price * quantity',
    'discount': 'total * 0.1',
    'final': 'total - discount'
}

@sort Mode

Sort order specification:

strategy={'order': 'asc' | 'desc'}

Example:

strategy={'order': 'desc'}

@aggregate Mode

Aggregation functions per column:

strategy={'column': 'function'}

Example:

strategy={
    'amount': 'sum',
    'count': 'count',
    'price': 'average'
}

@round Mode

Custom naming and positioning for rounded columns:

strategy={
    'column': {
        'name': 'new_column_name',
        'position': 'position_spec'
    }
}

Example:

strategy={
    'price': {
        'name': 'price_clean',
        'position': 'after:price'
    },
    'tax': {
        'name': 'tax_clean'
    }
}

@deduce Mode

KNN imputation parameters:

strategy={'k': int, 'weights': 'uniform' | 'distance'}

Example:

strategy={'k': 5, 'weights': 'distance'}

add.synthetic() Strategy

Column generation specifications.

Simple Form

strategy={'column': 'strategy_type'}

Example:

strategy={
    'id': 'increment',
    'age': 'normal(40, 10)',
    'subject_id': 'pattern:subj{increment:3}'
}

Complex Form

strategy={
    'column': {
        'type': 'strategy_type',
        'param1': value1,
        'param2': value2
    }
}

Example:

strategy={
    'name': {
        'type': 'choice',
        'values': ['Alice', 'Bob', 'Charlie']
    },
    'age': {
        'type': 'normal',
        'mean': 35,
        'std': 10
    }
}

Generation Types

Deterministic:

increment - Sequential numbers (1, 2, 3, ...)
increment:start - Start from specific number (e.g., increment:100)
increment:start:step - Custom start and step (e.g., increment:100:5)
pattern:text{increment:padding} - Pattern with leading zeros (e.g., pattern:subj{increment:3})

Random:

choice - Random choice from list
normal - Normal distribution (mean, std)
uniform - Uniform distribution (min, max)
lognormal - Log-normal distribution
exponential - Exponential distribution (lambda)
poisson - Poisson distribution (lambda)
categorical - Categorical distribution (probabilities)

Special:

linked_list - Linked list structure

Documentation

Complete Documentation

API Documentation - Complete API reference
Usage Examples - Real-world usage examples
CHANGELOG - Version history and changes

Additional Resources

Performance Benchmarks - Detailed performance analysis
Error Handling - Error handling guide
Integration Tests - Test coverage details

Examples

add.to() - Lookups and Joins

import additory as add
import polars as pl

orders = pl.DataFrame({
    'order_id': [1, 2, 3],
    'customer_id': [101, 102, 101]
})

customers = pl.DataFrame({
    'customer_id': [101, 102],
    'name': ['Alice', 'Bob']
})

# Basic lookup
result = add.to(orders, bring_from=customers, bring=['name'], against='customer_id')

# With aggregation
result = add.to(customers, bring_from=orders, bring='order_id', against='customer_id',
                strategy={'order_id': 'count'})

add.transform() - Transformations

# Calculate new columns
df = pl.DataFrame({'price': [100, 200], 'quantity': [2, 3]})
result = add.transform('@calc', df, strategy={'total': 'price * quantity'})

# Round numbers (creates NEW columns)
df = pl.DataFrame({'price': [10.567, 20.123]})
result = add.transform('@round:2', df, columns='price')  # Creates price_round

# Fill missing values
df = pl.DataFrame({'age': [25, None, 35, None, 45]})
result = add.transform('@deduce', df, columns='age', method='mean')

# KNN imputation
result = add.transform('@deduce', df, columns=['age', 'salary'], method='knn',
                       strategy={'k': 3})

add.synthetic() - Synthetic Data

# Create new DataFrame
result = add.synthetic('@new', n=1000, strategy={
    'age': 'normal(40, 10)',
    'salary': 'normal(75000, 15000)'
}, seed=42)

# Augment existing data
result = add.synthetic('@augment', df, n=100, seed=42)

API Reference

add.to()

Bring columns from one DataFrame to another.

Signature:

def to(
    bring_to,                                    # DataFrame to bring columns to
    bring_from,                                  # DataFrame to bring columns from
    bring: Union[str, List[str]],                # Column(s) to bring
    against: Union[str, List[str]],              # Key(s) to match against
    position: Optional[Union[str, int]] = None,  # Where to place columns
    *,
    strategy: Optional[Dict[str, Union[str, Dict[str, Any]]]] = None,
    join_type: str = 'lookup',
    logging: Union[bool, str] = 'default',
    as_type: Optional[Literal['pandas', 'polars']] = None,
    lineage: bool = False                        # Enable lineage tracking
) -> DataFrame

Parameters:

bring_to (DataFrame): Target DataFrame
bring_from (DataFrame): Source DataFrame
bring (str | list): Column(s) to bring
against (str | list): Key column(s) to match
position (str | int): Where to place columns ('start', 'end', 'after:col', 'before:col', or int)
strategy (dict): Column-level control (aggregation, rename, position)
join_type (str): Join type ('lookup', 'left', 'inner', 'outer')
logging (bool | str): Logging level (False, True, 'default')
as_type (str): Output format (None, 'pandas', 'polars')
lineage (bool): Enable lineage tracking (default: False)

Returns:

DataFrame: DataFrame with new columns added

add.transform()

Transform data within a DataFrame.

Signature:

def transform(
    mode: str,                                   # Transform mode
    df,                                          # DataFrame to transform
    columns: Optional[Union[str, List[str]]] = None,
    *,
    where: Optional[str] = None,
    by: Optional[Union[str, List[str]]] = None,
    position: Union[str, int] = 'end',
    strategy: Optional[Dict[str, Any]] = None,
    logging: Union[bool, str] = 'default',
    as_type: Optional[Literal['pandas', 'polars']] = None,
    lineage: bool = False                        # Enable lineage tracking
) -> DataFrame

Parameters:

mode (str): Transform mode ('@calc', '@filter', '@sort', '@aggregate', '@harmonize', '@round', '@transpose', '@extract', '@onehotencode', '@deduce')
df (DataFrame): Input DataFrame
columns (str | list): Column(s) to operate on
where (str): Filter condition (for @filter)
by (str | list): Group/sort columns
position (str | int): Where to place new columns
strategy (dict): Mode-specific options
logging (bool | str): Logging level
as_type (str): Output format
lineage (bool): Enable lineage tracking (default: False)

Returns:

DataFrame: Transformed DataFrame

add.synthetic()

Generate synthetic data.

Signature:

def synthetic(
    mode: str,                                   # Synthetic mode
    df: Optional[DataFrame] = None,              # DataFrame (for @augment/@analyze)
    n: Optional[int] = None,                     # Number of rows
    *,
    strategy: Optional[Dict[str, Any]] = None,   # Column generation strategies
    seed: int = 42,                              # Random seed
    logging: Union[bool, str] = 'default',
    as_type: Optional[Literal['pandas', 'polars']] = None,
    lineage: bool = False                        # Enable lineage tracking
) -> DataFrame

Parameters:

mode (str): Synthetic mode ('@new', '@augment', '@analyze'/'@analyse')
df (DataFrame): Input DataFrame (for @augment/@analyze)
n (int): Number of rows to generate
strategy (dict): Column generation strategies
seed (int): Random seed (default: 42)
logging (bool | str): Logging level
as_type (str): Output format
lineage (bool): Enable lineage tracking (default: False)

Returns:

DataFrame: Generated or augmented DataFrame

add.scan() (NEW!)

Scan DataFrames for statistical profiling and lineage tracking.

Signature:

def scan(
    mode: str,                                   # Scan mode
    df,                                          # DataFrame to scan
    *,
    columns: Optional[Union[str, List[str]]] = None,  # Column filter
    where: Optional[str] = None,                 # Row filter condition
    rows: Optional[str] = None,                  # Row range (first:N, last:N, M-N)
    trace: Optional[List[int]] = None,           # Cell trace [col_idx, row_idx]
    focus: Optional[str] = None,                 # Focus mode
    as_type: Optional[Literal['dataframe', 'dict', 'text']] = 'text'
) -> Union[DataFrame, Dict, str]

Parameters:

mode (str): Scan mode ('@analyze' or '@lineage')
df (DataFrame): Input DataFrame
columns (str | list): Filter output to specific columns
where (str): Filter rows by condition
rows (str): Row range specification ('first:10', 'last:5', '10-20')
trace (list): Cell coordinates for tracing [column_index, row_index]
focus (str): Focus mode for detailed analysis
- For @analyze: 'outliers', 'correlations', 'distributions'
- For @lineage: 'nulls', 'excluded', 'source:name'
as_type (str): Output format ('text', 'dataframe', 'dict')

Returns:

str | DataFrame | dict: Scan results in requested format

Examples:

# Statistical profiling
stats = add.scan('@analyze', df)
outliers = add.scan('@analyze', df, focus='outliers')

# Lineage tracking
lineage = add.scan('@lineage', df)  # Requires lineage=True in operations
null_analysis = add.scan('@lineage', df, focus='nulls')
cell_history = add.scan('@lineage', df, trace=[2, 5])

Transform Modes

@calc - Calculate New Columns

Calculate new columns from expressions.

Example:

result = add.transform('@calc', df, strategy={
    'total': 'price * quantity',
    'discount': 'total * 0.1'
})

@filter - Filter Rows

Filter rows based on conditions.

Example:

result = add.transform('@filter', df, where='age > 18 AND status == "active"')

@sort - Sort DataFrame

Sort DataFrame by columns.

Example:

result = add.transform('@sort', df, by='date', strategy={'order': 'desc'})

@aggregate - Group and Aggregate

Group by columns and aggregate.

Example:

result = add.transform('@aggregate', df, by='category', strategy={'amount': 'sum'})

@harmonize - Harmonize Units

Harmonize units across columns (10 sub-modes).

Example:

result = add.transform('@harmonize:weight', df)  # Creates weight_kg

@round - Round Numbers

Round numbers (creates NEW columns).

Example:

result = add.transform('@round:2', df, columns='price')  # Creates price_round

@transpose - Transpose DataFrame

Transpose DataFrame.

Example:

result = add.transform('@transpose', df)

@extract - Extract Patterns

Extract patterns from text or dates.

Example:

result = add.transform('@extract', df, columns='date', pattern='dd-MM-yyyy')

@onehotencode - One-Hot Encode

One-hot encode categorical columns.

Example:

result = add.transform('@onehotencode', df, columns='category')

@deduce - Fill Missing Values

Fill missing values using 7 methods.

Methods: auto, mean, median, mode, forward, backward, knn

Example:

# Mean imputation
result = add.transform('@deduce', df, columns='age', method='mean')

# KNN imputation
result = add.transform('@deduce', df, columns=['age', 'salary'], method='knn',
                       strategy={'k': 3})

Error Handling

Additory provides clear, actionable error messages:

try:
    # Using tuple instead of list
    result = add.to(orders, bring_from=customers, bring=['name'], 
                    against=('customer_id', 'date'))
except TypeError as e:
    print(e)
    # Parameter 'against' must be a list, not tuple.
    # Use ['customer_id', 'date'] instead of ('customer_id', 'date')

try:
    # Column not found
    result = add.transform('@calc', df, strategy={'result': 'nonexistent + 5'})
except RuntimeError as e:
    print(e)
    # Column 'nonexistent' not found in DataFrame
    # Available columns: a, b, c

All errors include:

Clear description of what went wrong
Contextual information (available options, etc.)
Actionable suggestions for fixing the problem

Migration from v0.1.3a5

If you're upgrading from v0.1.3a5, here are the key changes:

Parameter Renames

# OLD (v0.1.3a5)
add.to(orders, fetch_from=customers, fetch=['name'], by='customer_id')

# NEW (v0.1.3a9)
add.to(orders, bring_from=customers, bring=['name'], against='customer_id')

Lists Instead of Tuples

# OLD (v0.1.3a5)
add.to(orders, fetch_from=customers, fetch=['name'], by=('id', 'date'))

# NEW (v0.1.3a9)
add.to(orders, bring_from=customers, bring=['name'], against=['id', 'date'])

Removed Functions

# OLD (v0.1.3a5)
add.set(logging=True)
add.deduce(df, 'age', method='mean')

# NEW (v0.1.3a9)
# add.set() removed - use logging parameter per function
add.to(..., logging=True)
add.transform(..., logging=True)

# add.deduce() moved to transform mode
add.transform('@deduce', df, columns='age', method='mean')

@round Creates NEW Columns

# OLD (v0.1.3a5)
# @round modified columns in-place

# NEW (v0.1.3a9)
# @round creates NEW columns
result = add.transform('@round:2', df, columns='price')
# Creates: price_round (original price column unchanged)

Development

Running Tests

# Integration tests
cd python-specific
pytest tests/test_integration.py -v

# Rust tests
cd rust-core
cargo test --all

# Benchmarks
cd python-specific
python benchmarks/benchmark_integration.py

Building from Source

# Build Rust module
cd rust-core
cargo build --release

# Install Python package
cd python-specific
pip install -e .

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

See LICENSE file for details.

Support

For issues or questions:

Check API Documentation
Review Usage Examples
Contact development team

Changelog

See CHANGELOG.md for version history and changes.

Acknowledgments

Built with:

Rust - Systems programming language
PyO3 - Rust bindings for Python
Polars - Fast DataFrame library
Pandas - Data analysis library

Status: Alpha Release
Version: 0.1.3a9
Date: March 8, 2026

Project details

Release history Release notifications | RSS feed

0.1.3a11 pre-release

May 9, 2026

0.1.3a10 pre-release

Mar 12, 2026

This version

0.1.3a9 pre-release

Mar 8, 2026

0.1.3a8 pre-release

Mar 3, 2026

0.1.3a7 pre-release

Feb 13, 2026

0.1.3a6 pre-release

Feb 13, 2026

0.1.3a5 pre-release

Feb 13, 2026

0.1.3a4 pre-release

Feb 11, 2026

0.1.3a3 pre-release

Feb 9, 2026

0.1.3a2 pre-release

Feb 9, 2026

0.1.3a1 pre-release

Feb 9, 2026

0.1.2a1 pre-release

Feb 5, 2026

0.1.1a6 pre-release

Feb 4, 2026

0.1.1a5 pre-release

Feb 4, 2026

0.1.1a4 pre-release

Feb 4, 2026

0.1.1a3 pre-release

Feb 4, 2026

0.1.1a2 pre-release

Feb 4, 2026

0.1.1a1 pre-release

Feb 4, 2026

0.1.0a4 pre-release

Jan 28, 2026

0.1.0a3 pre-release

Jan 27, 2026

0.1.0a2 pre-release

Jan 25, 2026

0.1.0a1 pre-release

Jan 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

additory-0.1.3a9-py3-none-any.whl (71.9 kB view details)

Uploaded Mar 8, 2026 Python 3

File details

Details for the file additory-0.1.3a9-py3-none-any.whl.

File metadata

Download URL: additory-0.1.3a9-py3-none-any.whl
Upload date: Mar 8, 2026
Size: 71.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for additory-0.1.3a9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`966119c6e05a14d37d8931bc6c9b4d7c9b11f4e1f06adb636e3aead208fd536c`
MD5	`f3d1107d0c5de2f14e6fc21d5a48a495`
BLAKE2b-256	`a197019be7431714642afbf41e82cf9f914ed7dc633c63a0cec24fde86d71dcd`

See more details on using hashes here.

additory 0.1.3a9

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Additory v0.1.3a9

Overview

Key Features

Installation

Requirements

Optional Dependencies

Quick Start

Features (v0.1.3a9)

1. add.to() - Bring Columns from External Sources

2. add.transform() - Transform DataFrames

3. add.synthetic() - Generate Synthetic Data

4. add.scan() - Data Scanning and Lineage Tracking (NEW!)

What's New in v0.1.3a9

✅ New Features

✅ Previous Changes

Strategy Parameter Structure

add.to() Strategy

Simple Form (Aggregation Only)

Complex Form (Full Control)

Aggregation Modes (15)

add.transform() Strategy

@calc Mode

@sort Mode

@aggregate Mode

@round Mode

@deduce Mode

add.synthetic() Strategy

Simple Form

Complex Form

Generation Types

Documentation

Complete Documentation

Additional Resources

Examples

add.to() - Lookups and Joins

add.transform() - Transformations

add.synthetic() - Synthetic Data

API Reference

add.to()

add.transform()

add.synthetic()

add.scan() (NEW!)

Transform Modes

@calc - Calculate New Columns

@filter - Filter Rows

@sort - Sort DataFrame

@aggregate - Group and Aggregate

@harmonize - Harmonize Units

@round - Round Numbers

@transpose - Transpose DataFrame

@extract - Extract Patterns

@onehotencode - One-Hot Encode

@deduce - Fill Missing Values

Error Handling

Migration from v0.1.3a5

Parameter Renames

Lists Instead of Tuples

Removed Functions

@round Creates NEW Columns

Development

Running Tests

Building from Source

Contributing

License

Support

Changelog

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed