Skip to main content

Simple AutoML wrapper for tabular data. One-liner API for classification and regression.

Project description

anyml

AutoML for tabular data — classify, regress, and forecast in 3 lines of code.

PyPI Python License

anyml is a dead-simple AutoML library for tabular data. Point it at a CSV or a DataFrame and a target column, and it will auto-detect column types, build a preprocessing pipeline, train and compare several models via cross-validation, and return the best one. It uses scikit-learn under the hood for the core models and optionally integrates XGBoost and LightGBM for stronger gradient-boosted baselines.

Built by Viet-Anh Nguyen at NRL.ai.

Why anyml?

  • One-liner APIanyml.classify(df, target="y") trains and benchmarks models automatically
  • Plugin architecture — Register custom estimators or preprocessing steps
  • Local-first — Runs entirely on CPU, no API calls
  • Minimal core depsscikit-learn, pandas, numpy; XGBoost/LightGBM are optional
  • Production-ready — Serializable pipelines, prediction intervals, feature importance

Installation

pip install anyml

For optional boosting libraries:

pip install anyml[xgboost]     # XGBoost estimator
pip install anyml[lightgbm]    # LightGBM estimator
pip install anyml[all]         # everything

Python 3.8+ supported (tested on 3.8, 3.9, 3.10, 3.11, 3.12, 3.13)

Quick Start

import anyml
import pandas as pd

df = pd.read_csv("titanic.csv")

# 1. AutoML classification (auto-preprocesses, trains 3-5 models, returns the best)
model = anyml.classify(df, target="Survived")
print(model.best_name, model.best_score)    # e.g. "RandomForestClassifier" 0.83

# 2. Predict on new data
preds = model.predict(df.drop(columns=["Survived"]))

# 3. Regression is the same one-liner
price_model = anyml.regress(pd.read_csv("houses.csv"), target="price")
print(price_model.metrics)                  # rmse, mae, r2

# 4. Time-series forecasting
sales = pd.read_csv("sales.csv", parse_dates=["date"])
forecast = anyml.forecast(sales, time_col="date", target="sales", horizon=30)

Models & Methods

anyml trains and compares multiple models, then picks the best via cross-validation.

Classification models:

  • LogisticRegression (sklearn) — fast linear baseline
  • RandomForestClassifier (sklearn) — robust tree ensemble
  • XGBoostClassifier (optional via [xgboost]) — gradient boosting
  • LGBMClassifier (optional via [lightgbm]) — fast gradient boosting

Regression models:

  • LinearRegression (sklearn)
  • RandomForestRegressor (sklearn)
  • XGBoostRegressor (optional)
  • LGBMRegressor (optional)

Auto-preprocessing pipeline:

  1. Type detection — numeric, categorical, datetime columns auto-identified
  2. Missing value imputation — median for numeric, mode for categorical
  3. Encoding — OneHotEncoder for low-cardinality categoricals, OrdinalEncoder for high-cardinality
  4. Scaling — StandardScaler for numeric features
  5. Datetime features — extracts year, month, day, dayofweek, hour from datetime columns

Model selection — Stratified 5-fold cross-validation, picks best by F1 (classification) or R² (regression).

Feature importance — Native (tree models), coefficient-based (linear), or permutation importance fallback.

Time series forecasting — moving average, exponential smoothing, or linear trend (auto-selected by in-sample RMSE).

API Reference

Function Purpose
anyml.classify(df, target, models=None, cv=5) AutoML classification
anyml.regress(df, target, models=None, cv=5) AutoML regression
anyml.forecast(df, time_col, target, horizon) Time-series forecasting
anyml.profile(df) Quick data profile before training
model.predict(df) Inference on new data
model.predict_proba(df) Probabilities (classification)
model.feature_importance() Ranked feature importances
model.save(path) / anyml.load(path) Persistence

CLI Usage

anyml classify titanic.csv --target Survived --out titanic.pkl
anyml regress houses.csv --target price
anyml forecast sales.csv --time-col date --target sales --horizon 30
anyml predict titanic.pkl new_passengers.csv --out preds.csv

Examples

Classification with feature importance

import anyml
import pandas as pd

df = pd.read_csv("churn.csv")
model = anyml.classify(df, target="churn", cv=5)

print(f"Best model: {model.best_name} ({model.best_score:.3f})")
for feat, imp in model.feature_importance().head(10).items():
    print(f"  {feat}: {imp:.3f}")

model.save("churn.pkl")

Constrain which models are tried

import anyml

# Only try XGBoost and LightGBM (requires anyml[all])
model = anyml.classify(df, target="y", models=["xgboost", "lightgbm"])

Forecast next 90 days

import anyml, pandas as pd

sales = pd.read_csv("sales.csv", parse_dates=["date"])
fc = anyml.forecast(sales, time_col="date", target="revenue", horizon=90)

fc.plot()            # matplotlib plot of history + forecast
fc.to_csv("forecast.csv")

License

MIT (c) Viet-Anh Nguyen

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anyml-0.2.3.tar.gz (30.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anyml-0.2.3-py3-none-any.whl (24.8 kB view details)

Uploaded Python 3

File details

Details for the file anyml-0.2.3.tar.gz.

File metadata

  • Download URL: anyml-0.2.3.tar.gz
  • Upload date:
  • Size: 30.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for anyml-0.2.3.tar.gz
Algorithm Hash digest
SHA256 e2e6487ae23746c12fd8d03bb7a07f45bb90ddeb4ba0445c54ac1dab3cf3ac8a
MD5 4a22a0c94e25db62b772723ff7c25c38
BLAKE2b-256 583160feb18470197db59544409b77cb0882b9cc666b72b806d1b9b5b1eb6c93

See more details on using hashes here.

File details

Details for the file anyml-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: anyml-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 24.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for anyml-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 23e527ec8ee4b454ac67c10260f491784b037ccd53512ccdd57a1dc3e5c1d16e
MD5 f84efb144d73009e82cad6b935101ac2
BLAKE2b-256 cca5cdad05274639fde444faac27eda719351adce78ea2ba97c36c7471c618aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page