Autonomous ML agent that finds the best model for any dataset automatically
Project description
ModelScout ๐ค
Intelligent ML Model Recommendation System
An automated machine learning tool that analyzes your dataset and recommends the best-fitting ML models. ModelScout uses Auto-sklearn to intelligently search through a vast hyperparameter space and identifies optimal models for your specific data.
๐ฏ Features
- Automated Problem Detection: Automatically detects classification, regression, or clustering tasks
- Smart Model Selection: Uses Auto-sklearn to find the best models for your data
- Comprehensive Analysis: Provides detailed dataset analysis and insights
- Multiple Formats: Generates reports in text, JSON, and table formats
- REST API: Flask-based REST API for easy integration
- Support for All ML Tasks: Classification, Regression, Time-series, and more
๐ Project Structure
ModelScout/
โโโ agent/ # Core ML engine
โ โโโ data_analyzer.py # Dataset analysis module
โ โโโ model_selector.py # Model recommendation using Auto-sklearn
โ โโโ reporter.py # Report generation
โ โโโ orchestrator.py # Main pipeline orchestrator
โ โโโ __init__.py
โโโ api/ # REST API
โ โโโ main.py # Flask API endpoints
โ โโโ __init__.py
โโโ data/ # Sample datasets
โโโ models/ # Trained models storage
โโโ outputs/ # Generated reports
โโโ requirements.txt # Python dependencies
โโโ demo.py # Demo script with examples
โโโ README.md
๐ Quick Start
1. Installation
# Clone or navigate to the project directory
cd ModelScout
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
2. Basic Usage
from agent.orchestrator import ModelScout
# Initialize
scout = ModelScout(auto_train_time=300)
# Run complete pipeline
result = scout.run_full_pipeline(
data_path='your_data.csv',
target='target_column',
report_path='outputs/report.txt'
)
# Access results
print(result['recommendations']['best_model_name'])
print(result['recommendations']['test_score'])
print(result['report'])
3. Step-by-Step Usage
from agent.orchestrator import ModelScout
import pandas as pd
scout = ModelScout()
# Load data
df = scout.load_data('data.csv')
# Analyze data
analysis = scout.analyze_data(df, target='label')
print(f"Problem Type: {analysis['target_analysis']['type']}")
# Get recommendations
recommendations = scout.recommend_models(df, 'label')
print(f"Best Model: {recommendations['best_model_name']}")
print(f"Test Score: {recommendations['test_score']}")
# Generate report
report = scout.generate_report(output_format='text', output_path='report.txt')
๐ง API Endpoints
Health Check
GET /health
Analyze Dataset
POST /api/analyze
Content-Type: application/json
{
"file_path": "path/to/data.csv",
"target": "target_column"
}
Get Recommendations
POST /api/recommend
Content-Type: application/json
{
"file_path": "path/to/data.csv",
"target": "target_column",
"time_limit": 300
}
Generate Report
POST /api/report
Content-Type: application/json
{
"file_path": "path/to/data.csv",
"target": "target_column",
"format": "text"
}
Full Pipeline
POST /api/pipeline
Content-Type: application/json
{
"file_path": "path/to/data.csv",
"target": "target_column",
"time_limit": 300
}
๐ฎ Run Demo
python demo.py
The demo script:
- Creates sample datasets (Iris, Breast Cancer, Regression)
- Runs ModelScout on each dataset
- Generates comparison reports
- Demonstrates both classification and regression
๐ What ModelScout Analyzes
Data Characteristics
- Dataset size and shape
- Missing values and data quality
- Feature types and counts
- Memory usage
Target Variable
- Problem type (Classification/Regression)
- Class distribution (for classification)
- Value range (for regression)
- Class imbalance ratio
Feature Statistics
- Numeric: mean, std, min, max, missing count
- Categorical: unique values, missing count
๐ค How It Works
- Data Loading: Supports CSV, Excel, JSON formats
- Analysis: Comprehensive dataset profiling
- Problem Detection: Auto-detects ML task type
- Model Search: Auto-sklearn searches optimal models
- Evaluation: Train/test split and performance metrics
- Reporting: Generates detailed recommendations
๐ฆ Dependencies
- pandas: Data manipulation
- scikit-learn: ML algorithms
- auto-sklearn: Automated ML model selection
- numpy: Numerical computing
- matplotlib/seaborn: Visualization
- flask: REST API
- xgboost, lightgbm, catboost: Advanced models
- imbalanced-learn: Class imbalance handling
๐ Example Output
======================================================================
___ ___ _ _ ____ ___ _ _ ___
| \/ | | | | | / ___ \/ _ \| | | |_ |
| . . | ___ __| | ___| | / / \/ /_\ \ | | | / /
| |\/| |/ _ \ / _` |/ _ \ | \ \ | _ | | | |/ /
| | | | (_) || (_| | __/ | \ \__| | | | |_| / /
|_| |_|\___/ \__,_|\___|_| \___/_| |_|\___/___/
======================================================================
DATA OVERVIEW
======================================================================
Dataset Shape: (150, 5) (rows, columns)
Memory Usage: 0.00 MB
Missing Values: 0 (0.00%)
Numeric Features: 4
Categorical Features: 0
TARGET VARIABLE ANALYSIS
----------------------------------------------------------------------
Problem Type: CLASSIFICATION
Unique Values: 3
Missing Values: 0
Class Imbalance Ratio: 1.00:1
Class Distribution:
0: 50 (33.3%)
1: 50 (33.3%)
2: 50 (33.3%)
MODEL RECOMMENDATIONS
======================================================================
Best Model: RandomForestClassifier
Problem Type: CLASSIFICATION
Train Score: 1.0000
Test Score: 0.9333
Data Shape Used: (150, 4)
Number of Classes: 3
======================================================================
๐ ๏ธ Configuration
You can customize behavior by modifying parameters:
scout = ModelScout(
auto_train_time=600 # Increase for more thorough search (seconds)
)
๐ License
This project is for educational and portfolio purposes.
๐ค Contributing
Feel free to extend ModelScout with:
- Additional models
- More data preprocessing options
- Visualization enhancements
- Performance optimizations
๐ Support
For issues or questions, refer to the demo.py script for usage examples.
Happy Model Scouting! ๐ฏ
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file modelscout_ai-0.1.2.tar.gz.
File metadata
- Download URL: modelscout_ai-0.1.2.tar.gz
- Upload date:
- Size: 51.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a4d4122af79c30bd5af8ea8e18440548e4f788d3732bf160d662c251fbfe856d
|
|
| MD5 |
115ba909c066879842c6db6ea597a4be
|
|
| BLAKE2b-256 |
d643e8b9b9d1a7d4f7df59373c2b055285899006c78f77097b75d174d995fb13
|
File details
Details for the file modelscout_ai-0.1.2-py3-none-any.whl.
File metadata
- Download URL: modelscout_ai-0.1.2-py3-none-any.whl
- Upload date:
- Size: 59.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0ecb897716cab5625840523a992eae353ed6a79504f7248171b0fcc37072782
|
|
| MD5 |
129f3a233baf550d2fb93817fa6b9846
|
|
| BLAKE2b-256 |
d1d37e64de3bcc9696f3b4db823567e1c66574d640178c18f0423c4e6de9cce4
|