Skip to main content

LLM-driven intelligent data joining and relationship analysis agent

Project description

Join Agent

LLM-driven intelligent data joining and relationship analysis agent. The JoinAgent uses large language models (LLMs) to analyze table structures, suggest optimal join strategies, and validate the quality of joins between datasets.

🌟 Features

Analyze table structures and sample data to identify potential join keys. Suggest optimal join strategies with reasoning and confidence scores. Validate join schema compatibility and data overlap. Supports multiple operations: golden_dataset – Identify join keys and build join order across multiple tables to create a golden dataset. manual_data_prep – Determine join keys and join type between two tables for manual data preparation. Integrates with SFN Blueprint’s AI handler for LLM-powered reasoning. Returns structured join plans including validated join types and overlap percentages.

📦 Installation

Prerequisites

  • Python 3.11+
  • Git
  • uv package manager (recommended)

Setup

  1. Clone the repository

    git clone https://github.com/stepfnAI/join_agent.git
    cd join_agent/
    git checkout dev
    
  2. Set up the virtual environment and install dependencies

    python3.11 -m venv venv
    source venv/bin/activate
    pip install -r requirements.txt
    
  3. Clone and install the blueprint dependency

    cd ..
    git clone https://github.com/stepfnAI/sfn_blueprint.git
    cd sfn_blueprint
    git switch dev
    uv pip install -e .
    cd ../join_agent
    
  4. Set up environment variables

    # Optional: Configure LLM provider (default: openai)
    export LLM_PROVIDER="your_llm_provider"
    
    # Optional: Configure LLM model (default: gpt-4.1-mini)
    export LLM_MODEL="your_llm_model"
    
    # Required: Your LLM API key
    export LLM_API_KEY="your_llm_api_key"
    

🚀 Quick Start

Basic Usage

usage.py demonstrates how to run the agent with some sample data.

🧪 Testing

pip install pytest PYTHONPATH=src pytest -s tests/test_golden_dataset.py PYTHONPATH=src pytest -s tests/test_manual_data_prep.py

Development Setup

To set up the development environment:

  1. Create virtual environment: python3.11 -m venv .venv
  2. Activate virtual environment: source .venv/bin/activate
  3. Install dependencies: pip install -r requirements.txt
  4. Install sfn_blueprint in development mode

Usage:

This will support for detection of join keys from 2 to mutliple datsets for operation = "golden_dataset" it support for multiple join for operation = "manual_data_prep" it will support for only 2 table join

📝 Prompt Management

All LLM prompts used by the JoinAgent are centralized in src/join_agent/constants.py for easy review and maintenance.

Prompt Types

Based upon operations there are 2 kinds of prompts:

  • Golden_dataset_op_prompt: Template for analyzing join potential between multiple datasets purely based on column metadata
  • Manual_data_prep_prompt: Template for analyzing join potential between multiple datasets considering column metadata, groupby fields, primary table

Benefits

  • Easy Review: All prompts in one location for prompt engineering
  • Version Control: Track prompt changes alongside code changes
  • Maintainability: Update prompts without touching business logic
  • Consistency: Standardized prompt formatting across the agent

📚 Documentation

For detailed documentation, visit: https://join-agent.readthedocs.io

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📧 Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

join_agent-0.1.1.tar.gz (16.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

join_agent-0.1.1-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file join_agent-0.1.1.tar.gz.

File metadata

  • Download URL: join_agent-0.1.1.tar.gz
  • Upload date:
  • Size: 16.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for join_agent-0.1.1.tar.gz
Algorithm Hash digest
SHA256 15b4f0ebcb8da52e7a04b47eecc33871a9eac0f12ff9fe281971e47091731e98
MD5 755a5a7cb63e33abdeb5508f9366bd07
BLAKE2b-256 381204a4081bbe3c0cd8bf4ab529bc10eaecd15461908daffd7650c926a19d48

See more details on using hashes here.

File details

Details for the file join_agent-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: join_agent-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 13.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for join_agent-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1ac5ad5b3085ca0dd66549a2feaf6ff8a821659c77f7f06b88eded6f469bafb7
MD5 3e457a491c16e09d21712b28f7e178a8
BLAKE2b-256 afc9513f26f0945f2d453e283478ce0f6408c85b04331151a873d62fee88d130

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page