Skip to main content

Autonomous ML agent that finds the best model for any dataset automatically

Project description

ModelScout ๐Ÿค–

Intelligent ML Model Recommendation System

An automated machine learning tool that analyzes your dataset and recommends the best-fitting ML models. ModelScout uses Auto-sklearn to intelligently search through a vast hyperparameter space and identifies optimal models for your specific data.

๐ŸŽฏ Features

  • Automated Problem Detection: Automatically detects classification, regression, or clustering tasks
  • Smart Model Selection: Uses Auto-sklearn to find the best models for your data
  • Comprehensive Analysis: Provides detailed dataset analysis and insights
  • Multiple Formats: Generates reports in text, JSON, and table formats
  • REST API: Flask-based REST API for easy integration
  • Support for All ML Tasks: Classification, Regression, Time-series, and more

๐Ÿ“‹ Project Structure

ModelScout/
โ”œโ”€โ”€ agent/                    # Core ML engine
โ”‚   โ”œโ”€โ”€ data_analyzer.py     # Dataset analysis module
โ”‚   โ”œโ”€โ”€ model_selector.py    # Model recommendation using Auto-sklearn
โ”‚   โ”œโ”€โ”€ reporter.py          # Report generation
โ”‚   โ”œโ”€โ”€ orchestrator.py      # Main pipeline orchestrator
โ”‚   โ””โ”€โ”€ __init__.py
โ”œโ”€โ”€ api/                      # REST API
โ”‚   โ”œโ”€โ”€ main.py              # Flask API endpoints
โ”‚   โ””โ”€โ”€ __init__.py
โ”œโ”€โ”€ data/                     # Sample datasets
โ”œโ”€โ”€ models/                   # Trained models storage
โ”œโ”€โ”€ outputs/                  # Generated reports
โ”œโ”€โ”€ requirements.txt         # Python dependencies
โ”œโ”€โ”€ demo.py                  # Demo script with examples
โ””โ”€โ”€ README.md

๐Ÿš€ Quick Start

1. Installation

# Clone or navigate to the project directory
cd ModelScout

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Basic Usage

from agent.orchestrator import ModelScout

# Initialize
scout = ModelScout(auto_train_time=300)

# Run complete pipeline
result = scout.run_full_pipeline(
    data_path='your_data.csv',
    target='target_column',
    report_path='outputs/report.txt'
)

# Access results
print(result['recommendations']['best_model_name'])
print(result['recommendations']['test_score'])
print(result['report'])

3. Step-by-Step Usage

from agent.orchestrator import ModelScout
import pandas as pd

scout = ModelScout()

# Load data
df = scout.load_data('data.csv')

# Analyze data
analysis = scout.analyze_data(df, target='label')
print(f"Problem Type: {analysis['target_analysis']['type']}")

# Get recommendations
recommendations = scout.recommend_models(df, 'label')
print(f"Best Model: {recommendations['best_model_name']}")
print(f"Test Score: {recommendations['test_score']}")

# Generate report
report = scout.generate_report(output_format='text', output_path='report.txt')

๐Ÿ”ง API Endpoints

Health Check

GET /health

Analyze Dataset

POST /api/analyze
Content-Type: application/json

{
    "file_path": "path/to/data.csv",
    "target": "target_column"
}

Get Recommendations

POST /api/recommend
Content-Type: application/json

{
    "file_path": "path/to/data.csv",
    "target": "target_column",
    "time_limit": 300
}

Generate Report

POST /api/report
Content-Type: application/json

{
    "file_path": "path/to/data.csv",
    "target": "target_column",
    "format": "text"
}

Full Pipeline

POST /api/pipeline
Content-Type: application/json

{
    "file_path": "path/to/data.csv",
    "target": "target_column",
    "time_limit": 300
}

๐ŸŽฎ Run Demo

python demo.py

The demo script:

  1. Creates sample datasets (Iris, Breast Cancer, Regression)
  2. Runs ModelScout on each dataset
  3. Generates comparison reports
  4. Demonstrates both classification and regression

๐Ÿ“Š What ModelScout Analyzes

Data Characteristics

  • Dataset size and shape
  • Missing values and data quality
  • Feature types and counts
  • Memory usage

Target Variable

  • Problem type (Classification/Regression)
  • Class distribution (for classification)
  • Value range (for regression)
  • Class imbalance ratio

Feature Statistics

  • Numeric: mean, std, min, max, missing count
  • Categorical: unique values, missing count

๐Ÿค– How It Works

  1. Data Loading: Supports CSV, Excel, JSON formats
  2. Analysis: Comprehensive dataset profiling
  3. Problem Detection: Auto-detects ML task type
  4. Model Search: Auto-sklearn searches optimal models
  5. Evaluation: Train/test split and performance metrics
  6. Reporting: Generates detailed recommendations

๐Ÿ“ฆ Dependencies

  • pandas: Data manipulation
  • scikit-learn: ML algorithms
  • auto-sklearn: Automated ML model selection
  • numpy: Numerical computing
  • matplotlib/seaborn: Visualization
  • flask: REST API
  • xgboost, lightgbm, catboost: Advanced models
  • imbalanced-learn: Class imbalance handling

๐Ÿ” Example Output

======================================================================
  ___  ___           _      _    ____  ___  _   _ ___
 |  \/  |          | |    | |  / ___ \/ _ \| | | |_  |
 | .  . | ___    __| | ___| | / /   \/ /_\ \ | | | / /
 | |\/| |/ _ \  / _` |/ _ \ | \ \   |  _  | | | |/ /
 | |  | | (_) || (_| |  __/ |  \ \__| | | | |_| / /
 |_|  |_|\___/  \__,_|\___|_|   \___/_| |_|\___/___/

======================================================================

DATA OVERVIEW
======================================================================
Dataset Shape: (150, 5) (rows, columns)
Memory Usage: 0.00 MB
Missing Values: 0 (0.00%)
Numeric Features: 4
Categorical Features: 0

TARGET VARIABLE ANALYSIS
----------------------------------------------------------------------
Problem Type: CLASSIFICATION
Unique Values: 3
Missing Values: 0
Class Imbalance Ratio: 1.00:1
Class Distribution:
  0: 50 (33.3%)
  1: 50 (33.3%)
  2: 50 (33.3%)

MODEL RECOMMENDATIONS
======================================================================
Best Model: RandomForestClassifier
Problem Type: CLASSIFICATION
Train Score: 1.0000
Test Score: 0.9333
Data Shape Used: (150, 4)
Number of Classes: 3

======================================================================

๐Ÿ› ๏ธ Configuration

You can customize behavior by modifying parameters:

scout = ModelScout(
    auto_train_time=600  # Increase for more thorough search (seconds)
)

๐Ÿ“ License

This project is for educational and portfolio purposes.

๐Ÿค Contributing

Feel free to extend ModelScout with:

  • Additional models
  • More data preprocessing options
  • Visualization enhancements
  • Performance optimizations

๐Ÿ“ž Support

For issues or questions, refer to the demo.py script for usage examples.


Happy Model Scouting! ๐ŸŽฏ

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modelscout_ai-0.1.2.tar.gz (51.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

modelscout_ai-0.1.2-py3-none-any.whl (59.1 kB view details)

Uploaded Python 3

File details

Details for the file modelscout_ai-0.1.2.tar.gz.

File metadata

  • Download URL: modelscout_ai-0.1.2.tar.gz
  • Upload date:
  • Size: 51.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for modelscout_ai-0.1.2.tar.gz
Algorithm Hash digest
SHA256 a4d4122af79c30bd5af8ea8e18440548e4f788d3732bf160d662c251fbfe856d
MD5 115ba909c066879842c6db6ea597a4be
BLAKE2b-256 d643e8b9b9d1a7d4f7df59373c2b055285899006c78f77097b75d174d995fb13

See more details on using hashes here.

File details

Details for the file modelscout_ai-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: modelscout_ai-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 59.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for modelscout_ai-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a0ecb897716cab5625840523a992eae353ed6a79504f7248171b0fcc37072782
MD5 129f3a233baf550d2fb93817fa6b9846
BLAKE2b-256 d1d37e64de3bcc9696f3b4db823567e1c66574d640178c18f0423c4e6de9cce4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page