Skip to main content

LLM-driven intelligent join key suggestion agent

Project description

Join Agent

LLM-driven intelligent data joining and relationship analysis agent. The JoinAgent uses large language models (LLMs) to analyze table structures, suggest optimal join strategies, and validate the quality of joins between datasets.

🌟 Features

Analyze table structures and sample data to identify potential join keys. Suggest optimal join strategies with reasoning and confidence scores. Validate join schema compatibility and data overlap. Supports multiple operations: golden_dataset – Identify join keys and build join order across multiple tables to create a golden dataset. manual_data_prep – Determine join keys and join type between two tables for manual data preparation. Integrates with SFN Blueprint’s AI handler for LLM-powered reasoning. Returns structured join plans including validated join types and overlap percentages.

📦 Installation

Prerequisites

  • Python 3.11+
  • Git
  • uv – A fast Python package and environment manager.
    • For a quick setup on macOS/Linux, you can use:
      curl -LsSf https://astral.sh/uv/install.sh | sh
      

Setup

  1. Clone the repository

    git clone https://github.com/stepfnAI/join_agent.git
    cd join_agent/
    git checkout review
    
  2. Set up the virtual environment and install dependencies This command creates a .venv folder in the current directory and installs all required packages.

    uv sync --extra dev
    source .venv/bin/activate
    
  3. Clone and install the sfn_blueprint dependency: The agent requires the sfn_blueprint library. The following commands clone it into a sibling directory and install it in editable mode.

    cd ..
    git clone https://github.com/stepfnAI/sfn_blueprint.git
    cd sfn_blueprint
    git switch dev
    uv pip install -e .
    cd ../join_agent
    
  4. Set up environment variables

    # Optional: Configure LLM provider (default: openai)
    export LLM_PROVIDER="your_llm_provider"
    
    # Optional: Configure LLM model (default: gpt-4.1-mini)
    export LLM_MODEL="your_llm_model"
    
    # Required: Your LLM API key (Note: If LLM provider is opeani then 'export OPENAI_API_KEY', if it antropic 'export ANTROPIC_API_KEY', use this accordingly as per LLM provider )
    export OPENAI_API_KEY="your_llm_api_key"
    

🚀 Quick Start

Basic Usage

This will support for detection of join keys from 2 to mutliple datsets for operation = "golden_dataset" it support for multiple table join for operation = "manual_data_prep" it will support for only 2 table join

from root directory -

python examples/goldendataset_usage.py
python examples/manualdataprep_usage.py

🧪 Testing

pytest -s tests/test_joinagent.py

📝 Prompt Management

All LLM prompts used by the JoinAgent are centralized in src/join_agent/constants.py for easy review and maintenance.

Prompt Types

Based upon operations there are 2 kinds of prompts:

  • Golden_dataset_op_prompt: Template for analyzing join potential between multiple datasets purely based on column metadata
  • Manual_data_prep_prompt: Template for analyzing join potential between multiple datasets considering column metadata, groupby fields, primary table

Benefits

  • Easy Review: All prompts in one location for prompt engineering
  • Version Control: Track prompt changes alongside code changes
  • Maintainability: Update prompts without touching business logic
  • Consistency: Standardized prompt formatting across the agent

🏗️ Architecture

The Target Synthesis Agent is built with a modular architecture:

  • Core Components:

    • agent.py: Base agent implementation
    • models.py: Data models and schemas
    • constants.py: prompts
    • config.py: model configurations
  • Dependencies:

    • sfn-blueprint: Core framework and utilities
    • pydantic: Data validation

📚 Documentation

For detailed documentation, visit: https://join-agent.readthedocs.io

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📧 Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

join_agent-0.1.3.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

join_agent-0.1.3-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file join_agent-0.1.3.tar.gz.

File metadata

  • Download URL: join_agent-0.1.3.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.7

File hashes

Hashes for join_agent-0.1.3.tar.gz
Algorithm Hash digest
SHA256 15745f8ab6f984e961572953f6bea8e7a4ca4575395e8595bf1eb47f5d89e35b
MD5 5660b498935d308b29df75f58636a935
BLAKE2b-256 9121545c0e63bf3ee9361a0f252bbeb175a860d0cc80100487eab9544ddecbab

See more details on using hashes here.

File details

Details for the file join_agent-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: join_agent-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 12.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.7

File hashes

Hashes for join_agent-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 405181f82ea6f3c4a9cecd03f0af0da463fd3870dcf53c61a7bc9a08c1df124f
MD5 09cdd37832744b3cd5485fc9cf5647b3
BLAKE2b-256 7d7b161248ea69603e5adb8a11e5ec1376e1460efb5f3a39d1e1db9b3319ea84

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page