Autonomous ML agent that finds the best model for any dataset automatically

These details have not been verified by PyPI

Project links

Homepage

Project description

ModelScout 🤖

Intelligent ML Model Recommendation System

An automated machine learning tool that analyzes your dataset and recommends the best-fitting ML models. ModelScout uses Auto-sklearn to intelligently search through a vast hyperparameter space and identifies optimal models for your specific data.

🎯 Features

Automated Problem Detection: Automatically detects classification, regression, or clustering tasks
Smart Model Selection: Uses Auto-sklearn to find the best models for your data
Comprehensive Analysis: Provides detailed dataset analysis and insights
Multiple Formats: Generates reports in text, JSON, and table formats
REST API: Flask-based REST API for easy integration
Support for All ML Tasks: Classification, Regression, Time-series, and more

📋 Project Structure

ModelScout/
├── agent/                    # Core ML engine
│   ├── data_analyzer.py     # Dataset analysis module
│   ├── model_selector.py    # Model recommendation using Auto-sklearn
│   ├── reporter.py          # Report generation
│   ├── orchestrator.py      # Main pipeline orchestrator
│   └── __init__.py
├── api/                      # REST API
│   ├── main.py              # Flask API endpoints
│   └── __init__.py
├── data/                     # Sample datasets
├── models/                   # Trained models storage
├── outputs/                  # Generated reports
├── requirements.txt         # Python dependencies
├── demo.py                  # Demo script with examples
└── README.md

🚀 Quick Start

1. Installation

# Clone or navigate to the project directory
cd ModelScout

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Basic Usage

from agent.orchestrator import ModelScout

# Initialize
scout = ModelScout(auto_train_time=300)

# Run complete pipeline
result = scout.run_full_pipeline(
    data_path='your_data.csv',
    target='target_column',
    report_path='outputs/report.txt'
)

# Access results
print(result['recommendations']['best_model_name'])
print(result['recommendations']['test_score'])
print(result['report'])

3. Step-by-Step Usage

from agent.orchestrator import ModelScout
import pandas as pd

scout = ModelScout()

# Load data
df = scout.load_data('data.csv')

# Analyze data
analysis = scout.analyze_data(df, target='label')
print(f"Problem Type: {analysis['target_analysis']['type']}")

# Get recommendations
recommendations = scout.recommend_models(df, 'label')
print(f"Best Model: {recommendations['best_model_name']}")
print(f"Test Score: {recommendations['test_score']}")

# Generate report
report = scout.generate_report(output_format='text', output_path='report.txt')

🔧 API Endpoints

Health Check

GET /health

Analyze Dataset

POST /api/analyze
Content-Type: application/json

{
    "file_path": "path/to/data.csv",
    "target": "target_column"
}

Get Recommendations

POST /api/recommend
Content-Type: application/json

{
    "file_path": "path/to/data.csv",
    "target": "target_column",
    "time_limit": 300
}

Generate Report

POST /api/report
Content-Type: application/json

{
    "file_path": "path/to/data.csv",
    "target": "target_column",
    "format": "text"
}

Full Pipeline

POST /api/pipeline
Content-Type: application/json

{
    "file_path": "path/to/data.csv",
    "target": "target_column",
    "time_limit": 300
}

🎮 Run Demo

python demo.py

The demo script:

Creates sample datasets (Iris, Breast Cancer, Regression)
Runs ModelScout on each dataset
Generates comparison reports
Demonstrates both classification and regression

📊 What ModelScout Analyzes

Data Characteristics

Dataset size and shape
Missing values and data quality
Feature types and counts
Memory usage

Target Variable

Problem type (Classification/Regression)
Class distribution (for classification)
Value range (for regression)
Class imbalance ratio

Feature Statistics

Numeric: mean, std, min, max, missing count
Categorical: unique values, missing count

🤖 How It Works

Data Loading: Supports CSV, Excel, JSON formats
Analysis: Comprehensive dataset profiling
Problem Detection: Auto-detects ML task type
Model Search: Auto-sklearn searches optimal models
Evaluation: Train/test split and performance metrics
Reporting: Generates detailed recommendations

📦 Dependencies

pandas: Data manipulation
scikit-learn: ML algorithms
auto-sklearn: Automated ML model selection
numpy: Numerical computing
matplotlib/seaborn: Visualization
flask: REST API
xgboost, lightgbm, catboost: Advanced models
imbalanced-learn: Class imbalance handling

🔍 Example Output

======================================================================
  ___  ___           _      _    ____  ___  _   _ ___
 |  \/  |          | |    | |  / ___ \/ _ \| | | |_  |
 | .  . | ___    __| | ___| | / /   \/ /_\ \ | | | / /
 | |\/| |/ _ \  / _` |/ _ \ | \ \   |  _  | | | |/ /
 | |  | | (_) || (_| |  __/ |  \ \__| | | | |_| / /
 |_|  |_|\___/  \__,_|\___|_|   \___/_| |_|\___/___/

======================================================================

DATA OVERVIEW
======================================================================
Dataset Shape: (150, 5) (rows, columns)
Memory Usage: 0.00 MB
Missing Values: 0 (0.00%)
Numeric Features: 4
Categorical Features: 0

TARGET VARIABLE ANALYSIS
----------------------------------------------------------------------
Problem Type: CLASSIFICATION
Unique Values: 3
Missing Values: 0
Class Imbalance Ratio: 1.00:1
Class Distribution:
  0: 50 (33.3%)
  1: 50 (33.3%)
  2: 50 (33.3%)

MODEL RECOMMENDATIONS
======================================================================
Best Model: RandomForestClassifier
Problem Type: CLASSIFICATION
Train Score: 1.0000
Test Score: 0.9333
Data Shape Used: (150, 4)
Number of Classes: 3

======================================================================

🛠️ Configuration

You can customize behavior by modifying parameters:

scout = ModelScout(
    auto_train_time=600  # Increase for more thorough search (seconds)
)

📝 License

This project is for educational and portfolio purposes.

🤝 Contributing

Feel free to extend ModelScout with:

Additional models
More data preprocessing options
Visualization enhancements
Performance optimizations

📞 Support

For issues or questions, refer to the demo.py script for usage examples.

Happy Model Scouting! 🎯

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.2

Jun 12, 2026

This version

0.1.0

Jun 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modelscout_ai-0.1.0.tar.gz (53.4 kB view details)

Uploaded Jun 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

modelscout_ai-0.1.0-py3-none-any.whl (62.0 kB view details)

Uploaded Jun 12, 2026 Python 3

File details

Details for the file modelscout_ai-0.1.0.tar.gz.

File metadata

Download URL: modelscout_ai-0.1.0.tar.gz
Upload date: Jun 12, 2026
Size: 53.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for modelscout_ai-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`8b0bca551bc84cf6d2b16fefd7dc3093721701a6b9aa42eb88e8b6a01ce79015`
MD5	`efb971cf452b72b07387a429b232f780`
BLAKE2b-256	`ac0f529f652c8fe7b1def8f9900142e114a07fbb78cb213b424788870d683a8f`

See more details on using hashes here.

File details

Details for the file modelscout_ai-0.1.0-py3-none-any.whl.

File metadata

Download URL: modelscout_ai-0.1.0-py3-none-any.whl
Upload date: Jun 12, 2026
Size: 62.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for modelscout_ai-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`24fa49745b97b14045f94e7219aa0d8f2d86725b00082f85cc2ba896f45abcec`
MD5	`efa7ade3bc69b067b049cd150f346ca6`
BLAKE2b-256	`55cf5d9d359c8c47572d737c54278a42fdb38567bdc40f9fe60bea90263bbb79`

See more details on using hashes here.

modelscout-ai 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ModelScout 🤖

🎯 Features

📋 Project Structure

🚀 Quick Start

1. Installation

2. Basic Usage

3. Step-by-Step Usage

🔧 API Endpoints

Health Check

Analyze Dataset

Get Recommendations

Generate Report

Full Pipeline

🎮 Run Demo

📊 What ModelScout Analyzes

Data Characteristics

Target Variable

Feature Statistics

🤖 How It Works

📦 Dependencies

🔍 Example Output

🛠️ Configuration

📝 License

🤝 Contributing

📞 Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes