One-line AutoML: from idea to trained model using Hugging Face + AutoGluon

These details have not been verified by PyPI

Project description

title: AutoML AutoDB Select Pipeline emoji: 🚀 colorFrom: blue colorTo: indigo sdk: docker app_port: 7860 pinned: false

🚀 AutoHF

One-line AutoML: from idea to trained model using Hugging Face + AutoGluon.

AutoHF is an autonomous machine learning pipeline that takes a natural language description of a task (e.g., "sentiment analysis") and automatically finds the best datasets on Hugging Face, ranks them by quality, and trains a state-of-the-art model using AutoGluon.

✨ Features

🔍 Intent-to-Task: Automatically detects ML task types (classification, regression, etc.) and keywords from natural language.
📦 Autonomous Dataset Discovery: Searches the Hugging Face Hub for relevant datasets using multi-strategy search.
🏆 Intelligent Ranking: Ranks datasets based on quality signals like downloads, likes, and metadata completeness.
🏋️ Automated Training: Leverages AutoGluon to train high-quality models with minimal configuration.
🧬 Agentic Architecture: Inspired by patterns from AutoGen, LangGraph, and OpenHands for robust state management and collaboration.

🛠️ Internal Workflow

The following diagram shows how AutoHF orchestrates the pipeline from user input to a trained model:

graph TD
    User([User Input: 'sentiment analysis']) --> CLI[CLI / Python API]
    CLI --> Orchestrator[AutoHF Orchestrator]
    
    subgraph "Autonomous Pipeline (LangGraph-inspired States)"
        Orchestrator --> State1[Detecting Task]
        State1 --> TaskAgent[TaskAgent: Detects task type & keywords]
        
        TaskAgent --> State2[Searching Datasets]
        State2 --> DatasetAgent[DatasetAgent: Searches HF Hub]
        
        DatasetAgent --> State3[Ranking Datasets]
        State3 --> Ranker[DatasetRanker: Ranks by quality signals]
        
        Ranker --> State4[Loading & Profiling]
        State4 --> Loader[DatasetAgent: Loads best candidate & profiles]
        
        Loader --> State5[Training]
        State5 --> Trainer[AutoGluonTrainer: Trains & Optimizes]
    end
    
    Trainer --> State6[Completed]
    State6 --> Result[TrainResult: Model + Metrics]
    Result --> User

🏗️ Project Structure

AutoHF follows a modular, layered architecture organized into six core packages:

autohf/
├── __init__.py                 # Public API exports (AutoHF, AutoHFConfig, TrainResult, etc.)
├── cli/
│   ├── __init__.py
│   └── main.py                 # Typer CLI: train, search, info subcommands
├── core/
│   ├── __init__.py
│   ├── config.py               # Data models, presets, enums (PipelineState, TrainResult, DatasetCandidate...)
│   └── autohf.py               # AutoHF orchestrator — central state-machine coordinator
├── agents/
│   ├── __init__.py
│   ├── task_agent.py           # Intent-to-task detection (keyword / OpenAI router)
│   ├── dataset_agent.py        # Dataset discovery, loading, profiling (3-strategy HF Hub search)
│   └── model_agent.py          # Model search agent (Phase 2 preparation)
├── ranking/
│   ├── __init__.py
│   ├── dataset_ranker.py       # Keyword-based composite scoring (default)
│   ├── semantic_ranker.py      # Vector + Cross-Encoder semantic ranking (optional dep)
│   └── model_ranker.py         # Model ranking stub (Phase 2)
└── training/
    ├── __init__.py
    └── autogluon_trainer.py    # AutoGluon TabularPredictor wrapper (fit, eval, predict)

tests/
├── test_config.py              # Config defaults & preset validation
└── test_task_agent.py          # Keyword detection, fuzzy fallback, history

pyproject.toml                  # Build config, dependencies, CLI entry point, lint/test settings
README.md                       # This file

Module Responsibilities

Package	Responsibility	Key Classes / Functions
`core`	Configuration, data models, orchestration	`AutoHFConfig`, `PipelineState`, `AutoHF`
`agents`	External interaction — task detection, dataset/model discovery	`TaskAgent`, `DatasetAgent`, `ModelAgent`
`ranking`	Relevance & quality scoring for datasets and models	`DatasetRanker`, `SemanticRanker`, `rank_models`
`training`	Model training, evaluation, and inference	`train_model`, `load_predictor`, `predict`
`cli`	User-facing command-line interface	`train`, `search`, `info`

🏗️ Architecture & Patterns

AutoHF is built using modern software engineering patterns for AI:

State Management: Uses a typed state machine (PipelineState) inspired by LangGraph to track progress and handle transitions through the pipeline.
Agent Collaboration: Employs specialized agents (TaskAgent, DatasetAgent, ModelAgent) similar to AutoGen to separate concerns and enable independent extensibility.
Autonomous Execution: Implements retry logic and multi-strategy discovery patterns found in OpenHands for resilient dataset sourcing.
Tabular Power: Uses AutoGluon as the underlying engine for robust, automated model selection and hyperparameter tuning.

🛠️ Internal Workflow

Pipeline State Machine

The following diagram shows how AutoHF orchestrates the pipeline from user input to a trained model, including retry logic and ranking selection:

graph TD
    User([User Input: 'sentiment analysis']) --> CLI[CLI / Python API]
    CLI --> Orchestrator[AutoHF Orchestrator]
    
    subgraph "Autonomous Pipeline (LangGraph-inspired States)"
        Orchestrator --> State1[IDLE]
        State1 --> State2[DETECTING_TASK]
        State2 --> TaskAgent[TaskAgent: keyword / OpenAI router]
        TaskAgent --> State3[SEARCHING_DATASETS]
        State3 --> DatasetAgent[DatasetAgent: 3-strategy HF Hub search]
        DatasetAgent --> State4[RANKING_DATASETS]
        State4 --> RankerDecision{Ranker?}
        RankerDecision -->|default| DatasetRanker[DatasetRanker: keyword composite scoring]
        RankerDecision -->|search extras| SemanticRanker[SemanticRanker: vector + Cross-Encoder]
        DatasetRanker --> State5[LOADING_DATASET]
        SemanticRanker --> State5
        State5 --> LoadRetry{Load OK?}
        LoadRetry -->|No| DatasetAgent
        LoadRetry -->|Yes| State6[PROFILING_DATASET]
        State6 --> Profile[profile_dataset: stats + samples]
        Profile --> State7[TRAINING]
        State7 --> Trainer[AutoGluonTrainer: TabularPredictor.fit]
    end
    
    Trainer --> State8[EVALUATING]
    State8 --> State9[COMPLETED]
    State9 --> Result[TrainResult: model + metrics + paths]
    Result --> User

Class / Module Dependency Diagram

classDiagram
    class AutoHF {
        -config: AutoHFConfig
        -task_agent: TaskAgent
        -dataset_agent: DatasetAgent
        +train(task_description) TrainResult
        +search(task_description) list[DatasetCandidate]
    }
    
    class AutoHFConfig {
        +preset: Preset
        +time_limit: int
        +max_rows: int
        +problem_type: ProblemType
    }
    
    class TaskAgent {
        +detect_task(description) TaskInfo
        +list_supported_tasks()
    }
    
    class DatasetAgent {
        +find_datasets(task_type, keywords) list[DatasetCandidate]
        +load(dataset_id, config) DataFrame + cols
        +profile_dataset(df, text_col, label_col) DatasetProfile
    }
    
    class DatasetRanker {
        +rank_datasets(candidates, keywords) list[DatasetCandidate]
    }
    
    class SemanticRanker {
        +rank(candidates, problem_statement, keywords) list[DatasetCandidate]
    }
    
    class AutoGluonTrainer {
        +train_model(df, config, label) TrainResult
        +load_predictor(path) TabularPredictor
        +predict(predictor, df) Series
    }
    
    class TrainResult {
        +best_model_name: str
        +metrics: dict
        +model_path: str
        +leaderboard: DataFrame
    }
    
    class DatasetCandidate {
        +id: str
        +description: str
        +downloads: int
        +likes: int
        +tags: list[str]
        +score: float
    }

    CLI --> AutoHF : Uses
    AutoHF --> AutoHFConfig : Configures
    AutoHF --> TaskAgent : Orchestrates
    AutoHF --> DatasetAgent : Orchestrates
    AutoHF --> DatasetRanker : Uses
    AutoHF --> SemanticRanker : Uses [optional]
    AutoHF --> AutoGluonTrainer : Triggers
    AutoHF --> TrainResult : Returns
    DatasetAgent --> DatasetCandidate : Produces
    DatasetRanker --> DatasetCandidate : Ranks
    SemanticRanker --> DatasetCandidate : Ranks

Installation

# Basic installation
pip install autohf

# With training support (recommended)
pip install "autohf[train]"

CLI Usage

Train a model with a single command:

# Quick prototype
autohf train "sentiment analysis"

# Higher quality training
autohf train "spam detection" --preset high_quality

# Just search for datasets
autohf search "question answering" --models

Python API

from autohf import AutoHF

# Initialize and train
hf = AutoHF.from_preset("medium_quality")
result = hf.train("customer review classification")

# Access results
print(f"Best model: {result.best_model_name}")
print(f"Accuracy: {result.metrics['accuracy']}")
print(f"Model saved at: {result.model_path}")

📋 Presets

AutoHF provides several presets inspired by AutoGluon to balance speed and quality:

Preset	Time Limit	Focus
`quick_prototype`	60s	Fast iteration, small datasets
`medium_quality`	300s	Default - Good balance of speed/quality
`high_quality`	600s	Better results, longer training
`best_quality`	3600s	Maximum performance
`optimize_for_deployment`	300s	Small model size, fast inference

🏗️ Architecture & Patterns

AutoHF is built using modern software engineering patterns for AI:

State Management: Uses a typed state machine (via PipelineState) inspired by LangGraph to track progress and handle transitions.
Agent Collaboration: Employs specialized agents (TaskAgent, DatasetAgent) similar to AutoGen to separate concerns.
Autonomous Execution: Implements retry logic and multi-strategy discovery patterns found in OpenHands.
Tabular Power: Uses AutoGluon as the underlying engine for robust, automated model selection and hyperparameter tuning.

🗺️ Project Roadmap

Here is the planned development roadmap for AutoHF. Contributions and suggestions are welcome!

Phase 1: Core Pipeline (Completed / In Progress)

Intent-to-Task detection with keyword extraction
Autonomous Hugging Face dataset search with multi-strategy discovery
Intelligent dataset ranking (downloads, likes, metadata)
AutoGluon-based automated training integration
CLI and Python API entry points
Configuration presets (quick/medium/high/best quality)
Agentic architecture with TaskAgent, DatasetAgent, and DatasetRanker

Phase 2: Enhanced Model Hub

Support for custom model fine-tuning (beyond AutoGluon tabular models)
Integration with Hugging Face Model Hub for downloading pre-trained models
Multi-modal support (image, audio, text classification)
Model versioning and experiment tracking

Phase 3: Advanced Dataset Management

Dataset quality validation (missing values, class imbalance detection)
Automatic dataset cleaning and preprocessing recommendations
Train/validation/test split optimization
Dataset caching and local mirror support

Phase 4: Deployment & Serving

Model export to ONNX, TorchScript, and CoreML formats
REST API serving with FastAPI
Docker containerization for easy deployment
Batch prediction pipelines

Phase 5: Observability & Collaboration

Training metrics dashboard
Pipeline execution logs and audit trails
Team collaboration features (shared datasets, model registry)
CI/CD integration for model retraining

Phase 6: Enterprise Features

Private Hugging Face Hub / AWS S3 / Azure Blob Storage support
Role-based access control (RBAC)
Scalable distributed training support
Compliance and governance tooling

📜 License

MIT License. See LICENSE for details.

🤖 Auto-Push Scripts

AutoHF includes scripts for automated git pushing:

PowerShell (Windows)

.\git-auto-push.ps1 "Your commit message"
.\git-auto-push.ps1 "Your commit message" -Push:$false  # Skip push

Batch (Windows)

git-auto-push.bat "Your commit message"
git-auto-push.bat "Your commit message" nopush  # Skip push

Shell/Bash (Linux/macOS/WSL)

./git-auto-push.sh "Your commit message"
./git-auto-push.sh "Your commit message" nopush  # Skip push

These scripts automatically:

Stage all changes (git add -A)
Check for changes
Commit with your message
Push to the remote repository

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.0.0

Jun 9, 2026

This version

0.1.0

Jun 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autohf-0.1.0.tar.gz (39.3 kB view details)

Uploaded Jun 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

autohf-0.1.0-py3-none-any.whl (40.0 kB view details)

Uploaded Jun 7, 2026 Python 3

File details

Details for the file autohf-0.1.0.tar.gz.

File metadata

Download URL: autohf-0.1.0.tar.gz
Upload date: Jun 7, 2026
Size: 39.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for autohf-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e84a0cdd74f13a069468b216286e0475ded8fad7db5efa6d46980bdfb89e5c3e`
MD5	`414ca8ee981524cff3f0ed483fc4fcb0`
BLAKE2b-256	`481821c996c7101692c7b37d8aa06f9b353b17038de31d21528f9894a69565ba`

See more details on using hashes here.

Provenance

The following attestation bundles were made for autohf-0.1.0.tar.gz:

Publisher: publish.yml on teambugbusters00/automl-pipeine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: autohf-0.1.0.tar.gz
- Subject digest: e84a0cdd74f13a069468b216286e0475ded8fad7db5efa6d46980bdfb89e5c3e
- Sigstore transparency entry: 1751617443
- Sigstore integration time: Jun 7, 2026
Source repository:
- Permalink: teambugbusters00/automl-pipeine@c01cdba46971ef60e8c93811a308d445120d7578
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/teambugbusters00
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c01cdba46971ef60e8c93811a308d445120d7578
- Trigger Event: push

File details

Details for the file autohf-0.1.0-py3-none-any.whl.

File metadata

Download URL: autohf-0.1.0-py3-none-any.whl
Upload date: Jun 7, 2026
Size: 40.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for autohf-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`069fabf4933a19fb590fdff0bed00c3ee42e4e26807e001490d035d0dd5f537d`
MD5	`ed7916652e6f1d93551acb8996e7b18e`
BLAKE2b-256	`762dbb039e0df1fd2701ba48ae0e75be76e73da4238129dff74e3dbb0755fe69`

See more details on using hashes here.

Provenance

The following attestation bundles were made for autohf-0.1.0-py3-none-any.whl:

Publisher: publish.yml on teambugbusters00/automl-pipeine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: autohf-0.1.0-py3-none-any.whl
- Subject digest: 069fabf4933a19fb590fdff0bed00c3ee42e4e26807e001490d035d0dd5f537d
- Sigstore transparency entry: 1751617686
- Sigstore integration time: Jun 7, 2026
Source repository:
- Permalink: teambugbusters00/automl-pipeine@c01cdba46971ef60e8c93811a308d445120d7578
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/teambugbusters00
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c01cdba46971ef60e8c93811a308d445120d7578
- Trigger Event: push

autohf 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

title: AutoML AutoDB Select Pipeline emoji: 🚀 colorFrom: blue colorTo: indigo sdk: docker app_port: 7860 pinned: false

🚀 AutoHF

✨ Features

🛠️ Internal Workflow

🏗️ Project Structure

Module Responsibilities

🏗️ Architecture & Patterns

🛠️ Internal Workflow

Pipeline State Machine

Class / Module Dependency Diagram

Installation

CLI Usage

Python API

📋 Presets

🏗️ Architecture & Patterns

🗺️ Project Roadmap

Phase 1: Core Pipeline (Completed / In Progress)

Phase 2: Enhanced Model Hub

Phase 3: Advanced Dataset Management

Phase 4: Deployment & Serving

Phase 5: Observability & Collaboration

Phase 6: Enterprise Features

📜 License

🤖 Auto-Push Scripts

PowerShell (Windows)

Batch (Windows)

Shell/Bash (Linux/macOS/WSL)

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance