Data Flow Tools - flexible ETL pipeline framework

These details have not been verified by PyPI

Project links

Project description

DFT - Data Flow Tools

Flexible ETL pipeline framework designed for data analysts and engineers. Build, orchestrate, and monitor data pipelines with YAML configurations.

✨ Key Features

🔧 Component-Based: Modular sources, processors, and endpoints
🔌 Plugin System: Add custom components directly to your project
📋 YAML Configuration: Simple, readable pipeline definitions
🔗 Dependency Management: Automatic pipeline ordering and validation
📊 Interactive Documentation: Web-based pipeline exploration
💾 Database Support: PostgreSQL, MySQL, ClickHouse with upsert capabilities
🔄 Incremental Processing: Smart data loading with state management
⏱️ Microbatch Processing: Time-based data windows with lookback support
⚙️ Data Validation: Built-in quality checks and constraints
🎯 Analyst-Friendly: Rich CLI tools and component discovery

📚 Documentation

Custom Components Guide - Develop custom sources, processors, and endpoints
Database Integration Guide - Database connections, upsert operations, and incremental processing
Pipeline Dependencies Guide - Inter-pipeline dependencies and execution order
Microbatch Processing Guide - Time-based data windows, lookback strategies, and ETL optimization
A/B Testing Guide - Statistical hypothesis testing for experiments with T-test, Z-test, CUPED, and Bootstrap methods

🚀 Quick Start

1. Installation

Option A: Install from PyPI (Recommended)

# Install directly from PyPI
pip install dft-pipeline

Option B: Install from Source (For Development)

# Clone repository
git clone <repository-url>
cd dft

# Install package with dependencies
pip install -e .

2. Create Project

# Initialize new project
dft init my_analytics_project
cd my_analytics_project

3. Explore Examples

# Initialize a new project with examples
dft init my_analytics_project
cd my_analytics_project

# View interactive documentation
dft docs --serve
# Opens at http://localhost:8080

# Discover available components
dft components list

# Run a simple pipeline (uses sample data)
dft run --select simple_csv_example

# Try the custom components example  
dft run --select custom_example_pipeline

📦 Built-in Components

DFT includes pre-built components for common data operations:

Sources: CSV, PostgreSQL, MySQL, ClickHouse, Google Play, JSON
Processors: Data validation, anomaly detection
Endpoints: CSV, PostgreSQL, MySQL, ClickHouse, JSON output

# Discover available components
dft components list

# Get component details and examples
dft components describe postgresql --format yaml

# Interactive component browser
dft docs --serve

📋 Pipeline Configuration

Basic Pipeline

pipeline_name: simple_etl
description: Extract, validate, and load user data

connections:
  analytics_db:
    type: postgresql
    host: analytics.company.com
    database: warehouse
    user: analyst
    password: "${POSTGRES_PASSWORD}"

steps:
  - id: load_user_data
    type: source
    source_type: csv
    config:
      file_path: "data/users.csv"

  - id: validate_users
    type: processor
    processor_type: validator
    depends_on: [load_user_data]
    config:
      required_columns: [id, email, created_at]

  - id: save_clean_users
    type: endpoint
    endpoint_type: postgresql
    connection: analytics_db
    depends_on: [validate_users]
    config:
      table: users_clean
      mode: replace

Advanced Features

Pipeline Dependencies: depends_on: [other_pipeline]
Variables: {{ var("date") }} and {{ env_var("API_KEY") }}
Named Connections: Reusable database configurations
Tags: Organize pipelines with tags: [daily, analytics]

🔄 Pipeline Execution

# Run all pipelines
dft run

# Run specific pipeline
dft run --select customer_analytics

# Run by tags
dft run --select tag:daily

# Run with dependencies (dbt-style)
dft run --select +customer_analytics  # Include upstream dependencies
dft run --select customer_analytics+  # Include downstream dependencies

# Override variables
dft run --select analytics --vars date=2024-01-15,min_amount=5.00

📊 Documentation & Monitoring

# Interactive web documentation
dft docs --serve

# Validate pipeline configurations
dft validate

# Check pipeline dependencies
dft deps

# List available components
dft components list

🔍 Advanced Features

Environment Configuration: export DFT_ENV=prod
State Management: Automatic incremental processing with {{ state.get() }}
Custom Components: Plugin system for extending functionality

Custom Components

Add custom sources, processors, and endpoints in your project's dft/ directory:

# dft/sources/api_source.py
from dft.core.base import DataSource
from dft.core.data_packet import DataPacket

class ApiSource(DataSource):
    def extract(self, variables=None) -> DataPacket:
        # Your API extraction logic
        return DataPacket(data=data, metadata={})
    
    def test_connection(self) -> bool:
        return True

# Use in pipelines with snake_case naming
steps:
  - id: fetch_data
    type: source
    source_type: api  # Uses ApiSource class
    config:
      api_url: "https://api.example.com/data"

See Custom Components Guide for detailed examples.

📁 Project Structure

my_project/
├── dft_project.yml          # Project configuration
├── .env                     # Environment variables
├── pipelines/               # Pipeline definitions
│   ├── ingestion.yml
│   ├── analytics.yml
│   └── reporting.yml
├── dft/                     # Custom components (auto-created)
│   ├── sources/            # Custom data sources
│   ├── processors/         # Custom data processors
│   └── endpoints/          # Custom data endpoints
├── data/                    # Input data files
├── output/                  # Generated outputs
└── .dft/                    # DFT metadata and logs

🚀 Get Started

# Install DFT
pip install dft-pipeline

# Create new project
dft init my_project && cd my_project

# Explore components and documentation
dft docs --serve

# Run example pipeline
dft run --select custom_example_pipeline

📄 License

MIT License - see LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.24

Jul 4, 2025

0.3.23

Jul 4, 2025

0.3.22

Jul 4, 2025

0.3.21

Jul 4, 2025

0.3.20

Jul 1, 2025

0.3.19

Jul 1, 2025

0.3.18

Jul 1, 2025

0.3.17

Jul 1, 2025

0.3.16

Jul 1, 2025

0.3.15

Jul 1, 2025

0.3.14

Jul 1, 2025

0.3.13

Jul 1, 2025

0.3.12

Jul 1, 2025

0.3.11

Jul 1, 2025

0.3.10

Jul 1, 2025

0.3.9

Jul 1, 2025

0.3.8

Jul 1, 2025

0.3.7

Jul 1, 2025

0.3.6

Jul 1, 2025

0.3.5

Jun 30, 2025

0.3.4

Jun 30, 2025

0.3.3

Jun 30, 2025

0.3.2

Jun 30, 2025

0.3.1

Jun 29, 2025

This version

0.3.0

Jun 28, 2025

0.2.1

Jun 27, 2025

0.2.0

Jun 27, 2025

0.1.9

Jun 20, 2025

0.1.8

Jun 20, 2025

0.1.7

Jun 20, 2025

0.1.6

Jun 20, 2025

0.1.4

Jun 20, 2025

0.1.3

Jun 20, 2025

0.1.2

Jun 20, 2025

0.1.1

Jun 20, 2025

0.1.0

Jun 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dft_pipeline-0.3.0.tar.gz (72.0 kB view details)

Uploaded Jun 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dft_pipeline-0.3.0-py3-none-any.whl (83.6 kB view details)

Uploaded Jun 28, 2025 Python 3

File details

Details for the file dft_pipeline-0.3.0.tar.gz.

File metadata

Download URL: dft_pipeline-0.3.0.tar.gz
Upload date: Jun 28, 2025
Size: 72.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for dft_pipeline-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`2c91d3df8a932c5e1a05040777bee0e8b6463a3564ce2abb1f684a66996624bf`
MD5	`224cc7a026407e30b2833fe0e8310159`
BLAKE2b-256	`b62eec131add7c9c60c8b121a544fc51fc4eadc7de02f2ab94b7de49f3a6b4c1`

See more details on using hashes here.

File details

Details for the file dft_pipeline-0.3.0-py3-none-any.whl.

File metadata

Download URL: dft_pipeline-0.3.0-py3-none-any.whl
Upload date: Jun 28, 2025
Size: 83.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for dft_pipeline-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d017be426d275c7ef6d7437184ed5bf448cc4cf206ff5ab6560231ba41c2c247`
MD5	`84e76ed9a2c23a951a1ae1a7d496ac66`
BLAKE2b-256	`b14b1f6c1ffb8004ee1bdb6836ecc594d98cc6d9a0da05e55f5e0ce7db8205e7`

See more details on using hashes here.

dft-pipeline 0.3.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

DFT - Data Flow Tools

✨ Key Features

📚 Documentation

🚀 Quick Start

1. Installation

Option A: Install from PyPI (Recommended)

Option B: Install from Source (For Development)

2. Create Project

3. Explore Examples

📦 Built-in Components

📋 Pipeline Configuration

Basic Pipeline

Advanced Features

🔄 Pipeline Execution

📊 Documentation & Monitoring

🔍 Advanced Features

Custom Components

📁 Project Structure

🚀 Get Started

📄 License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes