Skip to main content

Add your description here

Project description

ml_approach_suggestion_agent

An AI-powered agent that analyzes a dataset and use case to recommend the most appropriate machine learning methodology.

Description

This agent takes a detailed description of a business domain, a specific use case, and information about the dataset—including column descriptions, insights, and target variable details—to suggest the best ML approach. It uses a large language model to:

  1. Analyze the relationship between the use case and the target variable.
  2. Evaluate the characteristics of the data (especially the target column).
  3. Recommend the most suitable methodology from a predefined list: Binary Classification, Multiclass Classification, Regression, Timeseries Regression, Timeseries Binary Classification, Timeseries Multiclass Classification, Recommendation, Clustering, Anomaly Detection, Forecasting, or Not Applicable.
  4. Provide a clear justification for its recommendation.

This helps data scientists and analysts quickly and confidently choose the right path for their modeling efforts, saving time and reducing the risk of starting with an incorrect approach.

Key Features

  • Intelligent Use Case Analysis: Leverages an LLM to understand the core objective of the business problem.
  • Target-Aware Recommendation: Places special emphasis on the nature of the target variable to guide its decision.
  • Context-Driven Suggestions: Considers the entire data context, including domain and column descriptions, to make an informed choice.
  • Accelerates Model Planning: Provides a validated starting point for ML projects, ensuring alignment between the problem and the proposed solution.

Installation

Prerequisites

  • uv – A fast Python package and environment manager.
    • For a quick setup on macOS/Linux, you can use:
      curl -LsSf https://astral.sh/uv/install.sh | sh
      
  • Git

Steps

  1. Clone the methodology_selection_agent repository:

    git clone https://github.com/stepfnAI/ml_approach_suggestion_agent.git
    cd ml_approach_suggestion_agent
    git switch main
    
  2. Create a virtual environment and install dependencies: This command creates a .venv folder in the current directory and installs all required packages.

    uv sync --extra dev
    source .venv/bin/activate
    

Configuration

You can configure the agent by creating a .env file in the project root or by exporting environment variables in your shell. Settings loaded via export will override those in a .env file.

Available Settings

Environment Variable Description Default
OPENAI_API_KEY (Required) Your OpenAI API key. None
METHODOLOGY_AI_PROVIDER AI provider for methodology suggestions. openai
METHODOLOGY_AI_MODEL AI model for methodology suggestions. gpt-4o
METHODOLOGY_TEMPERATURE AI model temperature (e.g., 0.0 to 0.5). 0.3
METHODOLOGY_MAX_TOKENS Maximum tokens for the AI response. 4000

Method 1: Using a .env File (Recommended)

Create a .env file in the root directory to store API keys and project-wide defaults.

Example .env file:

# .env

# --- Required Settings ---
OPENAI_API_KEY="sk-your-api-key-here"

# --- Optional Overrides ---
# Use a different model
METHODOLOGY_AI_MODEL="gpt-4o-mini"

# Use a lower temperature for more deterministic responses
METHODOLOGY_TEMPERATURE=0.1

Method 2: Using export Commands

Use export in your terminal for temporary settings or in CI/CD environments.

Example export commands:

# Set the environment variables for the current terminal session
export OPENAI_API_KEY="sk-your-api-key-here"
export METHODOLOGY_AI_MODEL="gpt-4o-mini"

Testing

To run the test suite, use the following command from the root of the project directory:

pytest -s

Usage

Running the Example Script

To see a quick demonstration, run the provided example script. This will execute the agent with pre-defined data and print the recommended methodology.

python examples/basic_usage.py

Using as a Library

Integrate the MLApproachDecisionAgent directly into your Python applications to get methodology recommendations programmatically.

import logging
from ml_approach_suggestion_agent.agent import MLApproachDecisionAgent

# Configure logging
logging.basicConfig(level=logging.INFO)

# 1. Define the domain, use case, and data context
domain_name = "Mortgage Loan Servicing"
domain_description = "Managing mortgage loans from post-origination to payoff, including payment collection, escrow management, and compliance for domestic and international loans."
use_case = "To predict the likelihood of a borrower becoming delinquent on their mortgage payment within the next 60 days using their demographic and financial data to enable proactive intervention."

column_descriptions = {
    "CreditScore": "Borrower's credit score from credit bureau sources",
    "EmploymentStatus": "Current employment status (e.g., employed, self-employed, unemployed)",
    # ... other column descriptions
}

column_insights = {
  "table_info": { "row_count": 50000 },
  "table_columns_info": {
    "CreditScore": { "data_type": "Int64", "min_max_value": [350, 850] },
    "EmploymentStatus": { "data_type": "string", "distinct_count": 5 },
    # ... other column insights
  }
}

target_column_name = "IsDelinquent"
target_column_insights = {
    "Target Column Description": "A binary categorical flag indicating if the borrower has missed one or more mortgage payments in the last 60 days.",
    "Data Type": "Integer (or Boolean)",
    "Value Distribution": {
      "0 (Not Delinquent)": "92%",
      "1 (Delinquent)": "8%"
    }
}

# 2. Prepare the task data payload
task_data = {
    "domain_name": domain_name,
    "domain_description": domain_description,
    "use_case": use_case,
    "column_descriptions": column_descriptions,
    "column_insights": column_insights,
    "target_column_name": target_column_name,
    "target_column_insights": target_column_insights
}

# 3. Initialize and execute the agent
agent = MLApproachDecisionAgent()
result = agent(task_data)

# 4. Print the suggested methodology
if result["success"]:
    print("Successfully suggested an approach:")
    print(result["result"]["approach"].model_dump_json(indent=4))
    print(f"Cost summary: {result['result']['cost_summary']}")
else:
    print("Failed to suggest an approach.")

Example Output

The agent returns a JSON object containing the recommended methodology and a detailed explanation for the choice.

(Note: The actual output may vary slightly based on the LLM's response.)

{
    "recommended": "Classification",
    "description": "The goal is to predict the likelihood of a borrower becoming delinquent on their mortgage payment within the next 60 days. This is a binary outcome (delinquent or not delinquent), making classification the appropriate methodology. The target variable is categorical, and the available demographic and financial data can be used as features to train a classification model."
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml_approach_suggestion_agent-0.1.10.tar.gz (13.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ml_approach_suggestion_agent-0.1.10-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file ml_approach_suggestion_agent-0.1.10.tar.gz.

File metadata

File hashes

Hashes for ml_approach_suggestion_agent-0.1.10.tar.gz
Algorithm Hash digest
SHA256 4ffdfbb9eceb1e26d1725de63c80367fdd07810446988f08b559b46f5ba85885
MD5 f700a775232a9fdd0a688bcd4a01c7cc
BLAKE2b-256 b8a1611b95b1113e018efb9c4e5b7cb34fd07549b682536381e8e29916687a08

See more details on using hashes here.

File details

Details for the file ml_approach_suggestion_agent-0.1.10-py3-none-any.whl.

File metadata

File hashes

Hashes for ml_approach_suggestion_agent-0.1.10-py3-none-any.whl
Algorithm Hash digest
SHA256 401d6fb4e7295339007ce284f03421fc824d4feca7cf312e3e27ee86d1786b56
MD5 c0f8d3d9852a4489b765c7d7c37c5562
BLAKE2b-256 a1e37ea3c27e5392cb62d54a76f6f09b9fb7c3065a178f256a19606c9381e52b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page