Skip to main content

Add your description here

Project description

ml_approach_suggestion_agent

An AI-powered agent that analyzes a dataset and use case to recommend the most appropriate machine learning methodology.

Description

This agent takes a detailed description of a business domain, a specific use case, and information about the dataset—including column descriptions, insights, and target variable details—to suggest the best ML approach. It uses a large language model to:

  1. Analyze the relationship between the use case and the target variable.
  2. Evaluate the characteristics of the data (especially the target column).
  3. Recommend the most suitable methodology from a predefined list: Classification, Regression, Forecasting, Clustering, or No-ML.
  4. Provide a clear justification for its recommendation.

This helps data scientists and analysts quickly and confidently choose the right path for their modeling efforts, saving time and reducing the risk of starting with an incorrect approach.

Key Features

  • Intelligent Use Case Analysis: Leverages an LLM to understand the core objective of the business problem.
  • Target-Aware Recommendation: Places special emphasis on the nature of the target variable to guide its decision.
  • Context-Driven Suggestions: Considers the entire data context, including domain and column descriptions, to make an informed choice.
  • Accelerates Model Planning: Provides a validated starting point for ML projects, ensuring alignment between the problem and the proposed solution.

Installation

Prerequisites

  • uv – A fast Python package and environment manager.
    • For a quick setup on macOS/Linux, you can use:
      curl -LsSf https://astral.sh/uv/install.sh | sh
      
  • Git

Steps

  1. Clone the methodology_selection_agent repository:

    git clone https://github.com/stepfnAI/ml_approach_suggestion_agent.git
    cd ml_approach_suggestion_agent
    git switch dev
    
  2. Create a virtual environment and install dependencies: This command creates a .venv folder in the current directory and installs all required packages.

    uv sync --extra dev
    source .venv/bin/activate
    
  3. Clone and install the sfn_blueprint dependency: The agent requires the sfn_blueprint library. The following commands clone it into a sibling directory and install it in editable mode.

    cd ../
    git clone https://github.com/stepfnAI/sfn_blueprint.git
    cd sfn_blueprint
    git switch dev
    uv pip install -e .
    cd ../methodology_selection_agent
    

Configuration

You can configure the agent by creating a .env file in the project root or by exporting environment variables in your shell. Settings loaded via export will override those in a .env file.

Available Settings

Environment Variable Description Default
OPENAI_API_KEY (Required) Your OpenAI API key. None
METHODOLOGY_AI_PROVIDER AI provider for methodology suggestions. openai
METHODOLOGY_AI_MODEL AI model for methodology suggestions. gpt-4o
METHODOLOGY_TEMPERATURE AI model temperature (e.g., 0.0 to 0.5). 0.3
METHODOLOGY_MAX_TOKENS Maximum tokens for the AI response. 4000

Method 1: Using a .env File (Recommended)

Create a .env file in the root directory to store API keys and project-wide defaults.

Example .env file:

# .env

# --- Required Settings ---
OPENAI_API_KEY="sk-your-api-key-here"

# --- Optional Overrides ---
# Use a different model
METHODOLOGY_AI_MODEL="gpt-4o-mini"

# Use a lower temperature for more deterministic responses
METHODOLOGY_TEMPERATURE=0.1

Method 2: Using export Commands

Use export in your terminal for temporary settings or in CI/CD environments.

Example export commands:

# Set the environment variables for the current terminal session
export OPENAI_API_KEY="sk-your-api-key-here"
export METHODOLOGY_AI_MODEL="gpt-4o-mini"

Testing

To run the test suite, use the following command from the root of the project directory:

pytest -s

Usage

Running the Example Script

To see a quick demonstration, run the provided example script. This will execute the agent with pre-defined data and print the recommended methodology.

python examples/basic_usage.py

Using as a Library

Integrate the MLApproachDecisionAgent directly into your Python applications to get methodology recommendations programmatically.

import logging
from ml_approach_suggestion_agent.agent import MLApproachDecisionAgent

# Configure logging
logging.basicConfig(level=logging.INFO)

# 1. Define the domain, use case, and data context
domain_name = "Mortgage Loan Servicing"
domain_description = "Managing mortgage loans from post-origination to payoff, including payment collection, escrow management, and compliance for domestic and international loans."
use_case = "To predict the likelihood of a borrower becoming delinquent on their mortgage payment within the next 60 days using their demographic and financial data to enable proactive intervention."

column_descriptions = {
    "CreditScore": "Borrower's credit score from credit bureau sources",
    "EmploymentStatus": "Current employment status (e.g., employed, self-employed, unemployed)",
    # ... other column descriptions
}

column_insights = {
  "table_info": { "row_count": 50000 },
  "table_columns_info": {
    "CreditScore": { "data_type": "Int64", "min_max_value": [350, 850] },
    "EmploymentStatus": { "data_type": "string", "distinct_count": 5 },
    # ... other column insights
  }
}

target_column_name = "IsDelinquent"
target_column_insights = {
    "Target Column Description": "A binary categorical flag indicating if the borrower has missed one or more mortgage payments in the last 60 days.",
    "Data Type": "Integer (or Boolean)",
    "Value Distribution": {
      "0 (Not Delinquent)": "92%",
      "1 (Delinquent)": "8%"
    }
}

# 2. Prepare the task data payload
task_data = {
    "domain_name": domain_name,
    "domain_description": domain_description,
    "use_case": use_case,
    "column_descriptions": column_descriptions,
    "column_insights": column_insights,
    "target_column_name": target_column_name,
    "target_column_insights": target_column_insights
}

# 3. Initialize and execute the agent
agent = MLApproachDecisionAgent()
result = agent(task_data)

# 4. Print the suggested methodology
if result["success"]:
    print("Successfully suggested an approach:")
    print(result["result"]["approach"].model_dump_json(indent=4))
    print(f"Cost summary: {result['result']['cost_summary']}")
else:
    print("Failed to suggest an approach.")

Example Output

The agent returns a JSON object containing the recommended methodology and a detailed explanation for the choice.

(Note: The actual output may vary slightly based on the LLM's response.)

{
    "recommended": "Classification",
    "description": "The goal is to predict the likelihood of a borrower becoming delinquent on their mortgage payment within the next 60 days. This is a binary outcome (delinquent or not delinquent), making classification the appropriate methodology. The target variable is categorical, and the available demographic and financial data can be used as features to train a classification model."
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml_approach_suggestion_agent-0.1.0.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ml_approach_suggestion_agent-0.1.0-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file ml_approach_suggestion_agent-0.1.0.tar.gz.

File metadata

File hashes

Hashes for ml_approach_suggestion_agent-0.1.0.tar.gz
Algorithm Hash digest
SHA256 40496f557287a28c7b96c5b513f784a077387a74a6979a09264948c4e1fd2b90
MD5 3e104c461e73ada6e16f52aa4a8b0e9b
BLAKE2b-256 8694214bd61a153f432d3863df94bc4846794a714716dcf263e3057b2d12db68

See more details on using hashes here.

File details

Details for the file ml_approach_suggestion_agent-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ml_approach_suggestion_agent-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fa264a57df7981aa7cc4f9bee529fa4e8af924375145dd0a8eb38a62c35f0226
MD5 2a5af47fb6cb5d8125be1019befac915
BLAKE2b-256 c43c35845b119eeed62382ee0b82ed19b9120c092737418424f1c24e744554d7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page