Skip to main content

Autonomous ML agent that finds the best model for any dataset automatically

Project description

ModelScout ๐Ÿค–

Intelligent ML Model Recommendation System

An automated machine learning tool that analyzes your dataset and recommends the best-fitting ML models. ModelScout uses Auto-sklearn to intelligently search through a vast hyperparameter space and identifies optimal models for your specific data.

๐ŸŽฏ Features

  • Automated Problem Detection: Automatically detects classification, regression, or clustering tasks
  • Smart Model Selection: Uses Auto-sklearn to find the best models for your data
  • Comprehensive Analysis: Provides detailed dataset analysis and insights
  • Multiple Formats: Generates reports in text, JSON, and table formats
  • REST API: Flask-based REST API for easy integration
  • Support for All ML Tasks: Classification, Regression, Time-series, and more

๐Ÿ“‹ Project Structure

ModelScout/
โ”œโ”€โ”€ agent/                    # Core ML engine
โ”‚   โ”œโ”€โ”€ data_analyzer.py     # Dataset analysis module
โ”‚   โ”œโ”€โ”€ model_selector.py    # Model recommendation using Auto-sklearn
โ”‚   โ”œโ”€โ”€ reporter.py          # Report generation
โ”‚   โ”œโ”€โ”€ orchestrator.py      # Main pipeline orchestrator
โ”‚   โ””โ”€โ”€ __init__.py
โ”œโ”€โ”€ api/                      # REST API
โ”‚   โ”œโ”€โ”€ main.py              # Flask API endpoints
โ”‚   โ””โ”€โ”€ __init__.py
โ”œโ”€โ”€ data/                     # Sample datasets
โ”œโ”€โ”€ models/                   # Trained models storage
โ”œโ”€โ”€ outputs/                  # Generated reports
โ”œโ”€โ”€ requirements.txt         # Python dependencies
โ”œโ”€โ”€ demo.py                  # Demo script with examples
โ””โ”€โ”€ README.md

๐Ÿš€ Quick Start

1. Installation

# Clone or navigate to the project directory
cd ModelScout

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Basic Usage

from agent.orchestrator import ModelScout

# Initialize
scout = ModelScout(auto_train_time=300)

# Run complete pipeline
result = scout.run_full_pipeline(
    data_path='your_data.csv',
    target='target_column',
    report_path='outputs/report.txt'
)

# Access results
print(result['recommendations']['best_model_name'])
print(result['recommendations']['test_score'])
print(result['report'])

3. Step-by-Step Usage

from agent.orchestrator import ModelScout
import pandas as pd

scout = ModelScout()

# Load data
df = scout.load_data('data.csv')

# Analyze data
analysis = scout.analyze_data(df, target='label')
print(f"Problem Type: {analysis['target_analysis']['type']}")

# Get recommendations
recommendations = scout.recommend_models(df, 'label')
print(f"Best Model: {recommendations['best_model_name']}")
print(f"Test Score: {recommendations['test_score']}")

# Generate report
report = scout.generate_report(output_format='text', output_path='report.txt')

๐Ÿ”ง API Endpoints

Health Check

GET /health

Analyze Dataset

POST /api/analyze
Content-Type: application/json

{
    "file_path": "path/to/data.csv",
    "target": "target_column"
}

Get Recommendations

POST /api/recommend
Content-Type: application/json

{
    "file_path": "path/to/data.csv",
    "target": "target_column",
    "time_limit": 300
}

Generate Report

POST /api/report
Content-Type: application/json

{
    "file_path": "path/to/data.csv",
    "target": "target_column",
    "format": "text"
}

Full Pipeline

POST /api/pipeline
Content-Type: application/json

{
    "file_path": "path/to/data.csv",
    "target": "target_column",
    "time_limit": 300
}

๐ŸŽฎ Run Demo

python demo.py

The demo script:

  1. Creates sample datasets (Iris, Breast Cancer, Regression)
  2. Runs ModelScout on each dataset
  3. Generates comparison reports
  4. Demonstrates both classification and regression

๐Ÿ“Š What ModelScout Analyzes

Data Characteristics

  • Dataset size and shape
  • Missing values and data quality
  • Feature types and counts
  • Memory usage

Target Variable

  • Problem type (Classification/Regression)
  • Class distribution (for classification)
  • Value range (for regression)
  • Class imbalance ratio

Feature Statistics

  • Numeric: mean, std, min, max, missing count
  • Categorical: unique values, missing count

๐Ÿค– How It Works

  1. Data Loading: Supports CSV, Excel, JSON formats
  2. Analysis: Comprehensive dataset profiling
  3. Problem Detection: Auto-detects ML task type
  4. Model Search: Auto-sklearn searches optimal models
  5. Evaluation: Train/test split and performance metrics
  6. Reporting: Generates detailed recommendations

๐Ÿ“ฆ Dependencies

  • pandas: Data manipulation
  • scikit-learn: ML algorithms
  • auto-sklearn: Automated ML model selection
  • numpy: Numerical computing
  • matplotlib/seaborn: Visualization
  • flask: REST API
  • xgboost, lightgbm, catboost: Advanced models
  • imbalanced-learn: Class imbalance handling

๐Ÿ” Example Output

======================================================================
  ___  ___           _      _    ____  ___  _   _ ___
 |  \/  |          | |    | |  / ___ \/ _ \| | | |_  |
 | .  . | ___    __| | ___| | / /   \/ /_\ \ | | | / /
 | |\/| |/ _ \  / _` |/ _ \ | \ \   |  _  | | | |/ /
 | |  | | (_) || (_| |  __/ |  \ \__| | | | |_| / /
 |_|  |_|\___/  \__,_|\___|_|   \___/_| |_|\___/___/

======================================================================

DATA OVERVIEW
======================================================================
Dataset Shape: (150, 5) (rows, columns)
Memory Usage: 0.00 MB
Missing Values: 0 (0.00%)
Numeric Features: 4
Categorical Features: 0

TARGET VARIABLE ANALYSIS
----------------------------------------------------------------------
Problem Type: CLASSIFICATION
Unique Values: 3
Missing Values: 0
Class Imbalance Ratio: 1.00:1
Class Distribution:
  0: 50 (33.3%)
  1: 50 (33.3%)
  2: 50 (33.3%)

MODEL RECOMMENDATIONS
======================================================================
Best Model: RandomForestClassifier
Problem Type: CLASSIFICATION
Train Score: 1.0000
Test Score: 0.9333
Data Shape Used: (150, 4)
Number of Classes: 3

======================================================================

๐Ÿ› ๏ธ Configuration

You can customize behavior by modifying parameters:

scout = ModelScout(
    auto_train_time=600  # Increase for more thorough search (seconds)
)

๐Ÿ“ License

This project is for educational and portfolio purposes.

๐Ÿค Contributing

Feel free to extend ModelScout with:

  • Additional models
  • More data preprocessing options
  • Visualization enhancements
  • Performance optimizations

๐Ÿ“ž Support

For issues or questions, refer to the demo.py script for usage examples.


Happy Model Scouting! ๐ŸŽฏ

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modelscout_ai-0.1.0.tar.gz (53.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

modelscout_ai-0.1.0-py3-none-any.whl (62.0 kB view details)

Uploaded Python 3

File details

Details for the file modelscout_ai-0.1.0.tar.gz.

File metadata

  • Download URL: modelscout_ai-0.1.0.tar.gz
  • Upload date:
  • Size: 53.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for modelscout_ai-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8b0bca551bc84cf6d2b16fefd7dc3093721701a6b9aa42eb88e8b6a01ce79015
MD5 efb971cf452b72b07387a429b232f780
BLAKE2b-256 ac0f529f652c8fe7b1def8f9900142e114a07fbb78cb213b424788870d683a8f

See more details on using hashes here.

File details

Details for the file modelscout_ai-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: modelscout_ai-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 62.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for modelscout_ai-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 24fa49745b97b14045f94e7219aa0d8f2d86725b00082f85cc2ba896f45abcec
MD5 efa7ade3bc69b067b049cd150f346ca6
BLAKE2b-256 55cf5d9d359c8c47572d737c54278a42fdb38567bdc40f9fe60bea90263bbb79

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page