LLM-driven intelligent join key suggestion agent
Project description
Join Agent
LLM-driven intelligent data joining and relationship analysis agent. The JoinAgent uses large language models (LLMs) to analyze table structures, suggest optimal join strategies, and validate the quality of joins between datasets.
🌟 Features
Analyze table structures and sample data to identify potential join keys. Suggest optimal join strategies with reasoning and confidence scores. Validate join schema compatibility and data overlap. Supports multiple operations: golden_dataset – Identify join keys and build join order across multiple tables to create a golden dataset. manual_data_prep – Determine join keys and join type between two tables for manual data preparation. Integrates with SFN Blueprint’s AI handler for LLM-powered reasoning. Returns structured join plans including validated join types and overlap percentages.
📦 Installation
Prerequisites
- Python 3.11+
- Git
- uv – A fast Python package and environment manager.
- For a quick setup on macOS/Linux, you can use:
curl -LsSf https://astral.sh/uv/install.sh | sh
- For a quick setup on macOS/Linux, you can use:
Setup
-
Clone the repository
git clone https://github.com/stepfnAI/join_agent.git cd join_agent/ git checkout review
-
Set up the virtual environment and install dependencies This command creates a
.venvfolder in the current directory and installs all required packages.uv sync --extra dev source .venv/bin/activate
-
Clone and install the
sfn_blueprintdependency: The agent requires thesfn_blueprintlibrary. The following commands clone it into a sibling directory and install it in editable mode.cd .. git clone https://github.com/stepfnAI/sfn_blueprint.git cd sfn_blueprint git switch dev uv pip install -e . cd ../join_agent
-
Set up environment variables
# Optional: Configure LLM provider (default: openai) export LLM_PROVIDER="your_llm_provider" # Optional: Configure LLM model (default: gpt-4.1-mini) export LLM_MODEL="your_llm_model" # Required: Your LLM API key (Note: If LLM provider is opeani then 'export OPENAI_API_KEY', if it antropic 'export ANTROPIC_API_KEY', use this accordingly as per LLM provider ) export OPENAI_API_KEY="your_llm_api_key"
🚀 Quick Start
Basic Usage
This will support for detection of join keys from 2 to mutliple datsets for operation = "golden_dataset" it support for multiple table join for operation = "manual_data_prep" it will support for only 2 table join
from root directory -
python examples/goldendataset_usage.py
python examples/manualdataprep_usage.py
🧪 Testing
pytest -s tests/test_joinagent.py
📝 Prompt Management
All LLM prompts used by the JoinAgent are centralized in src/join_agent/constants.py for easy review and maintenance.
Prompt Types
Based upon operations there are 2 kinds of prompts:
- Golden_dataset_op_prompt: Template for analyzing join potential between multiple datasets purely based on column metadata
- Manual_data_prep_prompt: Template for analyzing join potential between multiple datasets considering column metadata, groupby fields, primary table
Benefits
- Easy Review: All prompts in one location for prompt engineering
- Version Control: Track prompt changes alongside code changes
- Maintainability: Update prompts without touching business logic
- Consistency: Standardized prompt formatting across the agent
🏗️ Architecture
The Target Synthesis Agent is built with a modular architecture:
-
Core Components:
agent.py: Base agent implementationmodels.py: Data models and schemasconstants.py: promptsconfig.py: model configurations
-
Dependencies:
sfn-blueprint: Core framework and utilitiespydantic: Data validation
📚 Documentation
For detailed documentation, visit: https://join-agent.readthedocs.io
🤝 Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
📧 Contact
- Email: team@stepfunction.ai
- GitHub: https://github.com/stepfnAI/join_agent
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file join_agent-0.1.3.tar.gz.
File metadata
- Download URL: join_agent-0.1.3.tar.gz
- Upload date:
- Size: 14.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15745f8ab6f984e961572953f6bea8e7a4ca4575395e8595bf1eb47f5d89e35b
|
|
| MD5 |
5660b498935d308b29df75f58636a935
|
|
| BLAKE2b-256 |
9121545c0e63bf3ee9361a0f252bbeb175a860d0cc80100487eab9544ddecbab
|
File details
Details for the file join_agent-0.1.3-py3-none-any.whl.
File metadata
- Download URL: join_agent-0.1.3-py3-none-any.whl
- Upload date:
- Size: 12.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
405181f82ea6f3c4a9cecd03f0af0da463fd3870dcf53c61a7bc9a08c1df124f
|
|
| MD5 |
09cdd37832744b3cd5485fc9cf5647b3
|
|
| BLAKE2b-256 |
7d7b161248ea69603e5adb8a11e5ec1376e1460efb5f3a39d1e1db9b3319ea84
|