Skip to main content

Throw any data, get a working model. One-click AutoML with intelligent preprocessing, feature engineering, and ensemble optimization.

Project description

AutoThink

Throw any data, get a working model.

PyPI Python License Tests Stars

One-click AutoML for tabular data.
Auto-detects task type • Engineers features • Trains LightGBM + XGBoost + CatBoost • Optimizes blend weights
All in a single function call.


Quickstart

pip install autothink
import pandas as pd
from autothink import fit

df = pd.read_csv("train.csv")
model = fit(df, target="price")
predictions = model.predict(pd.read_csv("test.csv"))

That's it. Three lines.


How It Works

Your DataFrame
     |
     v
+--------------------+     +-------------------------+     +---------------------+
| Task Detection     | --> | Intelligent             | --> | Adaptive Feature    |
| binary / multiclass|     | Preprocessing           |     | Engineering         |
| / regression       |     | missing values, encode, |     | learns thresholds & |
|                    |     | scale                   |     | interactions        |
+--------------------+     +-------------------------+     +---------------------+
                                                                    |
                                                                    v
+--------------------+     +-------------------------+     +---------------------+
| Verification       | <-- | Blend Optimization      | <-- | Ensemble Training   |
| fold stability,    |     | scipy-optimized weights  |     | LightGBM + XGBoost  |
| leakage check      |     | + Platt calibration      |     | + CatBoost (K-fold) |
+--------------------+     +-------------------------+     +---------------------+
     |
     v
  model.predict(test_df)
Step What happens
Task detection Determines binary, multiclass, or regression from the target column
Data validation Checks for leakage, class imbalance, and quality issues
Preprocessing Handles missing values, one-hot / target-encodes categoricals, scales numerics
Feature engineering Learns optimal split thresholds and feature interactions from data
Ensemble training Trains LightGBM, XGBoost, and CatBoost with adaptive hyperparameters
Blend optimization Finds optimal ensemble weights via scipy on out-of-fold predictions
Calibration Platt scaling for well-calibrated probabilities
Verification Post-training diagnostics: fold variance, leakage, feature importance

Installation

From PyPI (coming soon):

pip install autothink

From source:

git clone https://github.com/ranausmanai/autothink.git
cd autothink
pip install -e .

With optional extras:

pip install autothink[dev]   # pytest
pip install autothink[api]   # FastAPI serving
pip install autothink[onnx]  # ONNX export

API Reference

fit(df, target, **kwargs)

One-line AutoML. Returns a fitted AutoThinkV4 instance.

Parameter Type Default Description
df DataFrame required Training data (features + target)
target str required Name of the target column
time_budget int 600 Maximum training time in seconds
verbose bool True Log progress to console

AutoThinkV4

from autothink import AutoThinkV4

model = AutoThinkV4(time_budget=300, verbose=True)
model.fit(df, target_col="price")
preds = model.predict(test_df)

Attributes after fitting:

Attribute Description
model.cv_score Mean cross-validation score
model.cv_std CV score standard deviation
model.task_info Detected task type, metric, class info
model.verification_report Post-training diagnostics

Logging

AutoThink uses Python's logging module. The library is silent by default.

import autothink
autothink.setup_logging()  # Enable INFO-level output to stderr

Or just use verbose=True (the default) which auto-configures a console handler.


Benchmarks

AutoThink V4 is competitive with FLAML and AutoGluon on standard tabular tasks:

Dataset AutoThink V4 FLAML AutoGluon
Heart Disease (AUC) 0.918 0.912 0.920
Loan Default (AUC) 0.874 0.869 0.871
House Price (RMSE) 30,241 31,102 29,876

60-second time budget, single 80/20 split, seed=42. Lower RMSE is better.


Examples

See the examples/ directory:

Example Description
quickstart.py Minimal 15-line fit/predict on sklearn data
kaggle_competition.py Full Kaggle pipeline with CLI and submission output
benchmark.py Compare AutoThink against FLAML

Project Structure

autothink/
  __init__.py            # Public API: fit(), setup_logging()
  core/
    autothink_v4.py      # Main engine (TaskDetector, IntelligentEnsemble, AutoThinkV4)
    autothink_v3.py      # V3 engine (Kaggle-optimized)
    autothink_v2.py      # V2 engine (meta-learning)
    preprocessing.py     # IntelligentPreprocessor, FeatureEngineer
    feature_engineering_general.py  # Adaptive, data-driven feature engineering
    validation.py        # DataValidator, LeakageDetector
    meta_learning.py     # MetaLearningDB, dataset fingerprinting
    production.py        # ModelExporter, ModelCard, DriftDetector, APIGenerator
    advanced.py          # CausalAutoML, ExplanationEngine, SmartEnsemble
    kaggle_beast.py      # Competition-grade ensemble mode
    kaggle_fast.py       # Fast Kaggle mode
tests/                   # 25 tests (pytest)
examples/                # Quickstart, Kaggle, benchmark

Contributing

Contributions are welcome! Please open an issue or submit a PR.

# Development setup
git clone https://github.com/ranausmanai/autothink.git
cd autothink
pip install -e ".[dev]"
pytest tests/

License

Apache 2.0 — see LICENSE.


Built with scikit-learn, LightGBM, XGBoost, and CatBoost.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autothink-4.0.0.tar.gz (50.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autothink-4.0.0-py3-none-any.whl (56.0 kB view details)

Uploaded Python 3

File details

Details for the file autothink-4.0.0.tar.gz.

File metadata

  • Download URL: autothink-4.0.0.tar.gz
  • Upload date:
  • Size: 50.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for autothink-4.0.0.tar.gz
Algorithm Hash digest
SHA256 65c7699a1c6040b470496d0aae5c37db4cca889b78fa4aea1d92c4860d1c7aa3
MD5 aedc7652f14987a2d933d753904e3a67
BLAKE2b-256 20505a74bb3f0b29a2a325373b0613fb357d09997e409dd6373e5dea2106b0d5

See more details on using hashes here.

File details

Details for the file autothink-4.0.0-py3-none-any.whl.

File metadata

  • Download URL: autothink-4.0.0-py3-none-any.whl
  • Upload date:
  • Size: 56.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for autothink-4.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f6e1dd2486fffa3fc865676574bb803227108d40d3b3dbd578f54a05e1c9c772
MD5 759ac8f6900f8a0bb078532e4ec39cca
BLAKE2b-256 e17d97c23ccf825b961f4425e25df45d6cea9921ee1eb013e64b56998b4cd7e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page