Fast Deep Feature Synthesis for tabular data

These details have not been verified by PyPI

Project links

Project description

FastDFS - Deep Feature Synthesis for Tabular Data

FastDFS is a Python library for automated feature engineering using Deep Feature Synthesis (DFS). It augments target dataframes with rich features derived from relational database structures, making it easy to create powerful features for machine learning without manual feature engineering.

Core Concept

FastDFS treats feature engineering as a table augmentation process: given any target dataframe and a relational database (RDB) containing related tables, it automatically generates new features by aggregating information across relationships.

# Your target dataframe (what you want to predict on)
target_df = pd.DataFrame({
    "user_id": [1, 2, 3],
    "item_id": [100, 200, 300], 
    "interaction_time": ["2024-01-01", "2024-01-02", "2024-01-03"]
})

# Your relational database (context for feature generation)
rdb = fastdfs.load_rdb("ecommerce_data/")  # Contains user, item, interaction tables

# Generate features automatically
enriched_df = fastdfs.compute_dfs_features(
    rdb=rdb,
    target_dataframe=target_df,
    key_mappings={"user_id": "user.user_id", "item_id": "item.item_id"},
    cutoff_time_column="interaction_time"
)
# Result: Original columns + 50+ new features like user_avg_rating, item_count_purchases, etc.

Installation

pip install fastdfs

Or for development:

git clone https://github.com/dglai/fastdfs.git
cd fastdfs
pip install -e .

Quick Start

1. Prepare Your Data

FastDFS provides multiple ways to prepare your relational data.

Option A: Create from DataFrames (Recommended)

You can create an RDB directly from pandas DataFrames. FastDFS will automatically infer the schema.

import fastdfs
import pandas as pd

# 1. Define your tables
users_df = pd.DataFrame(...)
items_df = pd.DataFrame(...)
interactions_df = pd.DataFrame(...)

# 2. Create RDB with relationships
rdb = fastdfs.create_rdb(
    name="ecommerce",
    tables={
        "user": users_df,
        "item": items_df,
        "interaction": interactions_df
    },
    primary_keys={
        "user": "user_id",
        "item": "item_id"
    },
    foreign_keys=[
        ("interaction", "user_id", "user", "user_id"),
        ("interaction", "item_id", "item", "item_id")
    ],
    time_columns={
        "interaction": "timestamp"
    }
)

# 3. Save for later use
rdb.save("ecommerce_rdb/")

# 4. Load it back
rdb = fastdfs.load_rdb("ecommerce_rdb/")

Option B: Adapt Existing Datasets

FastDFS includes adapters for popular relational dataset benchmarks.

RelBench

from fastdfs.adapter.relbench import RelBenchAdapter

# Load and convert RelBench dataset
adapter = RelBenchAdapter("rel-stack")
rdb = adapter.load()
rdb.save("rel-stack-rdb/")

DBInfer

from fastdfs.adapter.dbinfer import DBInferAdapter

# Load and convert DBInfer dataset
adapter = DBInferAdapter("diginetica")
rdb = adapter.load()
rdb.save("diginetica-rdb/")

Option C: Load from Relational Database

FastDFS supports loading data directly from SQL databases (SQLite, MySQL, PostgreSQL, DuckDB).

from fastdfs.adapter.sqlite import SQLiteAdapter
# from fastdfs.adapter.mysql import MySQLAdapter
# from fastdfs.adapter.postgres import PostgreSQLAdapter

# Connect to database
adapter = SQLiteAdapter(
      "ecommerce.db",
      time_columns={"orders": "created_at"},  # (Optional) specify which column is the time column for which table
      type_hints={"users": {"age": "float"}}  # (Optional) specify the desired column data type
)

# Or for MySQL/PostgreSQL:
# adapter = MySQLAdapter("mysql+pymysql://user:pass@host/db")

rdb = adapter.load()

2. Generate Features

import fastdfs
import pandas as pd

# Prepare your rdb from the methods above
rdb = ...

# Create or load your target dataframe
target_df = pd.DataFrame({
    "user_id": [1, 2, 3],
    "item_id": [10, 20, 30],
    "prediction_time": ["2024-01-01", "2024-01-02", "2024-01-03"]
})

# Generate features
features = fastdfs.compute_dfs_features(
    rdb=rdb,
    target_dataframe=target_df, 
    key_mappings={
        "user_id": "user.user_id",
        "item_id": "item.item_id"  
    },
    cutoff_time_column="prediction_time",
    config_overrides={"max_depth": 2}
)

print(f"Original columns: {len(target_df.columns)}")
print(f"With features: {len(features.columns)}")

3. Advanced Usage with Transforms

# Apply preprocessing transforms before feature generation
from fastdfs.transform import RDBTransformWrapper, RDBTransformPipeline, HandleDummyTable, FeaturizeDatetime

pipeline = fastdfs.DFSPipeline(
    transform_pipeline=RDBTransformPipeline([
        HandleDummyTable(),
        RDBTransformWrapper(FeaturizeDatetime(features=["year", "month", "hour"]))
    ]),
    dfs_config=fastdfs.DFSConfig(max_depth=3, engine="dfs2sql")
)

features = pipeline.run(
    rdb=rdb,
    target_dataframe=target_df,
    key_mappings=key_mappings,
    cutoff_time_column="prediction_time"
)

Key Features

Table-Centric Design: Augment any dataframe, not just predefined datasets
Multiple DFS Engines: Choose between Featuretools (pandas) or DFS2SQL (high-performance)
Temporal Consistency: Built-in cutoff time support prevents data leakage
Flexible Key Mapping: Connect target data to RDB with simple column mappings
Transform Pipeline: Composable preprocessing transforms for data cleaning
Type Safety: Full type hints and runtime validation
Minimal Dependencies: Focused, lightweight package

Engine Comparison

Feature	Featuretools	DFS2SQL
Performance	Good for small data	Excellent for large data
Memory Usage	High (pandas)	Low (SQL-based)
Primitives	Rich set	Core primitives
Backend	Pandas	DuckDB

Documentation

User Guide: Complete tutorial with concepts and examples
API Reference: Detailed API documentation
Examples: Runnable code examples

Why FastDFS?

Before FastDFS (manual feature engineering):

# Manual aggregations for each feature
user_avg_rating = interactions.groupby('user_id')['rating'].mean()
user_total_purchases = interactions.groupby('user_id').size()
item_avg_rating = interactions.groupby('item_id')['rating'].mean()
# ... dozens more features ...

With FastDFS (automated):

# Automatic generation of 50+ features
features = fastdfs.compute_dfs_features(rdb, target_df, key_mappings)

FastDFS automatically discovers relationships in your data and generates meaningful aggregation features, saving weeks of manual feature engineering work.

Contributing

We welcome contributions! See our development logs for project history and architecture decisions.

License

Apache-2.0 License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Feb 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastdfs-0.2.0.tar.gz (86.6 kB view details)

Uploaded Feb 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fastdfs-0.2.0-py3-none-any.whl (76.6 kB view details)

Uploaded Feb 12, 2026 Python 3

File details

Details for the file fastdfs-0.2.0.tar.gz.

File metadata

Download URL: fastdfs-0.2.0.tar.gz
Upload date: Feb 12, 2026
Size: 86.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for fastdfs-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`43b0a417526814ce582289e7418139b1e9fcff40d34745f2aca4405b0b11a984`
MD5	`d979a73853e5573b7ef6caf888fd3dfd`
BLAKE2b-256	`099f02573aeb930995219bc2e84f14ad83e48cd5d8817c3ac701de0ba1fe8202`

See more details on using hashes here.

File details

Details for the file fastdfs-0.2.0-py3-none-any.whl.

File metadata

Download URL: fastdfs-0.2.0-py3-none-any.whl
Upload date: Feb 12, 2026
Size: 76.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for fastdfs-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c73b40759e7795967fd63822c99c005f252039c982b51123aa05e8487052e195`
MD5	`0ab49693c4977d8cc89671654f98790e`
BLAKE2b-256	`75b3c56207b779ba44ee98563b25a62278cc9068a72bda17c13d0d6a19fbb5c0`

See more details on using hashes here.

fastdfs 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

FastDFS - Deep Feature Synthesis for Tabular Data

Core Concept

Installation

Quick Start

1. Prepare Your Data

Option A: Create from DataFrames (Recommended)

Option B: Adapt Existing Datasets

Option C: Load from Relational Database

2. Generate Features

3. Advanced Usage with Transforms

Key Features

Engine Comparison

Documentation

Why FastDFS?

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes