Simple AutoML wrapper for tabular data. One-liner API for classification and regression.
Project description
anyml
AutoML for tabular data — classify, regress, and forecast in 3 lines of code.
anyml is a dead-simple AutoML library for tabular data. Point it at a CSV or a DataFrame and a target column, and it will auto-detect column types, build a preprocessing pipeline, train and compare several models via cross-validation, and return the best one. It uses scikit-learn under the hood for the core models and optionally integrates XGBoost and LightGBM for stronger gradient-boosted baselines.
Built by Viet-Anh Nguyen at NRL.ai.
Why anyml?
- One-liner API —
anyml.classify(df, target="y")trains and benchmarks models automatically - Plugin architecture — Register custom estimators or preprocessing steps
- Local-first — Runs entirely on CPU, no API calls
- Minimal core deps —
scikit-learn,pandas,numpy; XGBoost/LightGBM are optional - Production-ready — Serializable pipelines, prediction intervals, feature importance
Installation
pip install anyml
For optional boosting libraries:
pip install anyml[xgboost] # XGBoost estimator
pip install anyml[lightgbm] # LightGBM estimator
pip install anyml[all] # everything
Python 3.8+ supported (tested on 3.8, 3.9, 3.10, 3.11, 3.12, 3.13)
Quick Start
import anyml
import pandas as pd
df = pd.read_csv("titanic.csv")
# 1. AutoML classification (auto-preprocesses, trains 3-5 models, returns the best)
model = anyml.classify(df, target="Survived")
print(model.best_name, model.best_score) # e.g. "RandomForestClassifier" 0.83
# 2. Predict on new data
preds = model.predict(df.drop(columns=["Survived"]))
# 3. Regression is the same one-liner
price_model = anyml.regress(pd.read_csv("houses.csv"), target="price")
print(price_model.metrics) # rmse, mae, r2
# 4. Time-series forecasting
sales = pd.read_csv("sales.csv", parse_dates=["date"])
forecast = anyml.forecast(sales, time_col="date", target="sales", horizon=30)
Models & Methods
anyml trains and compares multiple models, then picks the best via cross-validation.
Classification models:
LogisticRegression(sklearn) — fast linear baselineRandomForestClassifier(sklearn) — robust tree ensembleXGBoostClassifier(optional via[xgboost]) — gradient boostingLGBMClassifier(optional via[lightgbm]) — fast gradient boosting
Regression models:
LinearRegression(sklearn)RandomForestRegressor(sklearn)XGBoostRegressor(optional)LGBMRegressor(optional)
Auto-preprocessing pipeline:
- Type detection — numeric, categorical, datetime columns auto-identified
- Missing value imputation — median for numeric, mode for categorical
- Encoding — OneHotEncoder for low-cardinality categoricals, OrdinalEncoder for high-cardinality
- Scaling — StandardScaler for numeric features
- Datetime features — extracts year, month, day, dayofweek, hour from datetime columns
Model selection — Stratified 5-fold cross-validation, picks best by F1 (classification) or R² (regression).
Feature importance — Native (tree models), coefficient-based (linear), or permutation importance fallback.
Time series forecasting — moving average, exponential smoothing, or linear trend (auto-selected by in-sample RMSE).
API Reference
| Function | Purpose |
|---|---|
anyml.classify(df, target, models=None, cv=5) |
AutoML classification |
anyml.regress(df, target, models=None, cv=5) |
AutoML regression |
anyml.forecast(df, time_col, target, horizon) |
Time-series forecasting |
anyml.profile(df) |
Quick data profile before training |
model.predict(df) |
Inference on new data |
model.predict_proba(df) |
Probabilities (classification) |
model.feature_importance() |
Ranked feature importances |
model.save(path) / anyml.load(path) |
Persistence |
CLI Usage
anyml classify titanic.csv --target Survived --out titanic.pkl
anyml regress houses.csv --target price
anyml forecast sales.csv --time-col date --target sales --horizon 30
anyml predict titanic.pkl new_passengers.csv --out preds.csv
Examples
Classification with feature importance
import anyml
import pandas as pd
df = pd.read_csv("churn.csv")
model = anyml.classify(df, target="churn", cv=5)
print(f"Best model: {model.best_name} ({model.best_score:.3f})")
for feat, imp in model.feature_importance().head(10).items():
print(f" {feat}: {imp:.3f}")
model.save("churn.pkl")
Constrain which models are tried
import anyml
# Only try XGBoost and LightGBM (requires anyml[all])
model = anyml.classify(df, target="y", models=["xgboost", "lightgbm"])
Forecast next 90 days
import anyml, pandas as pd
sales = pd.read_csv("sales.csv", parse_dates=["date"])
fc = anyml.forecast(sales, time_col="date", target="revenue", horizon=90)
fc.plot() # matplotlib plot of history + forecast
fc.to_csv("forecast.csv")
License
MIT (c) Viet-Anh Nguyen
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file anyml-0.2.4.tar.gz.
File metadata
- Download URL: anyml-0.2.4.tar.gz
- Upload date:
- Size: 29.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01c99877bce6b9704a18729bbae9f023f58ae643afc0a2b5c10d4be8c3b88568
|
|
| MD5 |
ee498c242720190190cf24371bc312d7
|
|
| BLAKE2b-256 |
11088dacae6528d6cb98ba6b51c1ef2ae620c6c36a03e48f43617b048246203e
|
File details
Details for the file anyml-0.2.4-py3-none-any.whl.
File metadata
- Download URL: anyml-0.2.4-py3-none-any.whl
- Upload date:
- Size: 24.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5e22fad19a0cf64c27356d5e90d649191b967f10446da6dca330d1cbe2a7d40
|
|
| MD5 |
fc09d289e850cc74c17b4e709988fe01
|
|
| BLAKE2b-256 |
714def03f1929e545806255abdb2957e2033ea48e1593c2751e79ad9d20d20c7
|