Skip to main content

Simple AutoML wrapper for tabular data. One-liner API for classification and regression.

Project description

anyml

AutoML for tabular data — classify, regress, and forecast in 3 lines of code.

PyPI Python License

anyml is a dead-simple AutoML library for tabular data. Point it at a CSV or a DataFrame and a target column, and it will auto-detect column types, build a preprocessing pipeline, train and compare several models via cross-validation, and return the best one. It uses scikit-learn under the hood for the core models and optionally integrates XGBoost and LightGBM for stronger gradient-boosted baselines.

Built by Viet-Anh Nguyen at NRL.ai.

Why anyml?

  • One-liner APIanyml.classify(df, target="y") trains and benchmarks models automatically
  • Plugin architecture — Register custom estimators or preprocessing steps
  • Local-first — Runs entirely on CPU, no API calls
  • Minimal core depsscikit-learn, pandas, numpy; XGBoost/LightGBM are optional
  • Production-ready — Serializable pipelines, prediction intervals, feature importance

Installation

pip install anyml

For optional boosting libraries:

pip install anyml[xgboost]     # XGBoost estimator
pip install anyml[lightgbm]    # LightGBM estimator
pip install anyml[all]         # everything

Python 3.8+ supported (tested on 3.8, 3.9, 3.10, 3.11, 3.12, 3.13)

Quick Start

import anyml
import pandas as pd

df = pd.read_csv("titanic.csv")

# 1. AutoML classification (auto-preprocesses, trains 3-5 models, returns the best)
model = anyml.classify(df, target="Survived")
print(model.best_name, model.best_score)    # e.g. "RandomForestClassifier" 0.83

# 2. Predict on new data
preds = model.predict(df.drop(columns=["Survived"]))

# 3. Regression is the same one-liner
price_model = anyml.regress(pd.read_csv("houses.csv"), target="price")
print(price_model.metrics)                  # rmse, mae, r2

# 4. Time-series forecasting
sales = pd.read_csv("sales.csv", parse_dates=["date"])
forecast = anyml.forecast(sales, time_col="date", target="sales", horizon=30)

Models & Methods

anyml trains and compares multiple models, then picks the best via cross-validation.

Classification models:

  • LogisticRegression (sklearn) — fast linear baseline
  • RandomForestClassifier (sklearn) — robust tree ensemble
  • XGBoostClassifier (optional via [xgboost]) — gradient boosting
  • LGBMClassifier (optional via [lightgbm]) — fast gradient boosting

Regression models:

  • LinearRegression (sklearn)
  • RandomForestRegressor (sklearn)
  • XGBoostRegressor (optional)
  • LGBMRegressor (optional)

Auto-preprocessing pipeline:

  1. Type detection — numeric, categorical, datetime columns auto-identified
  2. Missing value imputation — median for numeric, mode for categorical
  3. Encoding — OneHotEncoder for low-cardinality categoricals, OrdinalEncoder for high-cardinality
  4. Scaling — StandardScaler for numeric features
  5. Datetime features — extracts year, month, day, dayofweek, hour from datetime columns

Model selection — Stratified 5-fold cross-validation, picks best by F1 (classification) or R² (regression).

Feature importance — Native (tree models), coefficient-based (linear), or permutation importance fallback.

Time series forecasting — moving average, exponential smoothing, or linear trend (auto-selected by in-sample RMSE).

API Reference

Function Purpose
anyml.classify(df, target, models=None, cv=5) AutoML classification
anyml.regress(df, target, models=None, cv=5) AutoML regression
anyml.forecast(df, time_col, target, horizon) Time-series forecasting
anyml.profile(df) Quick data profile before training
model.predict(df) Inference on new data
model.predict_proba(df) Probabilities (classification)
model.feature_importance() Ranked feature importances
model.save(path) / anyml.load(path) Persistence

CLI Usage

anyml classify titanic.csv --target Survived --out titanic.pkl
anyml regress houses.csv --target price
anyml forecast sales.csv --time-col date --target sales --horizon 30
anyml predict titanic.pkl new_passengers.csv --out preds.csv

Examples

Classification with feature importance

import anyml
import pandas as pd

df = pd.read_csv("churn.csv")
model = anyml.classify(df, target="churn", cv=5)

print(f"Best model: {model.best_name} ({model.best_score:.3f})")
for feat, imp in model.feature_importance().head(10).items():
    print(f"  {feat}: {imp:.3f}")

model.save("churn.pkl")

Constrain which models are tried

import anyml

# Only try XGBoost and LightGBM (requires anyml[all])
model = anyml.classify(df, target="y", models=["xgboost", "lightgbm"])

Forecast next 90 days

import anyml, pandas as pd

sales = pd.read_csv("sales.csv", parse_dates=["date"])
fc = anyml.forecast(sales, time_col="date", target="revenue", horizon=90)

fc.plot()            # matplotlib plot of history + forecast
fc.to_csv("forecast.csv")

License

MIT (c) Viet-Anh Nguyen

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anyml-0.2.4.tar.gz (29.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anyml-0.2.4-py3-none-any.whl (24.8 kB view details)

Uploaded Python 3

File details

Details for the file anyml-0.2.4.tar.gz.

File metadata

  • Download URL: anyml-0.2.4.tar.gz
  • Upload date:
  • Size: 29.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for anyml-0.2.4.tar.gz
Algorithm Hash digest
SHA256 01c99877bce6b9704a18729bbae9f023f58ae643afc0a2b5c10d4be8c3b88568
MD5 ee498c242720190190cf24371bc312d7
BLAKE2b-256 11088dacae6528d6cb98ba6b51c1ef2ae620c6c36a03e48f43617b048246203e

See more details on using hashes here.

File details

Details for the file anyml-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: anyml-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 24.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for anyml-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e5e22fad19a0cf64c27356d5e90d649191b967f10446da6dca330d1cbe2a7d40
MD5 fc09d289e850cc74c17b4e709988fe01
BLAKE2b-256 714def03f1929e545806255abdb2957e2033ea48e1593c2751e79ad9d20d20c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page