Skip to main content

Simple AutoML wrapper for tabular data. One-liner API for classification and regression.

Project description

anyml

anyml logo

PyPI Python License

Simple AutoML for tabular data. One-liner API for classification and regression with automatic preprocessing, model selection, and evaluation.

Runs completely offline. All processing uses local scikit-learn and XGBoost models. No cloud APIs or internet connection required.

Built by Viet-Anh Nguyen | nrl.ai

Installation

pip install anyml

With XGBoost support:

pip install "anyml[xgboost]"

With all optional dependencies:

pip install "anyml[full]"

Quick Start

Classification

import pandas as pd
import anyml

df = pd.read_csv("data.csv")
result = anyml.classify(df, target="label")

print(result.score)         # Best cross-validation accuracy
print(result.model_name)    # e.g. "random_forest"
print(result.report())      # Full classification report

Regression

result = anyml.regress(df, target="price")

print(result.score)         # Best cross-validation RMSE (negative)
print(result.report())      # RMSE, MAE, R2

Predict on New Data

predictions = result.predict(new_df)

Feature Importance

result.explain()
# Returns a DataFrame with feature names and importance scores

Compare Models

scores = anyml.compare(df, target="label")
# {'logistic_regression': 0.92, 'random_forest': 0.95, ...}

Choose Specific Models

result = anyml.classify(df, target="label", models=["xgboost", "random_forest"])

Standalone Preprocessing

processed_df = anyml.preprocess(df)
# or with target separation:
X, y = anyml.preprocess(df, target="label")

Save and Load

result.save("model.joblib")
loaded = anyml.load_model("model.joblib")
loaded.predict(new_df)

How It Works

Automatic Preprocessing

anyml automatically detects column types and applies appropriate transformations:

Data Type Handling
Numeric Median imputation + StandardScaler
Categorical (low cardinality) Mode imputation + OneHotEncoding
Categorical (high cardinality) Mode imputation + OneHotEncoding
Datetime Extract year, month, day, day-of-week
Missing values Automatic imputation (median for numeric, mode for categorical)

Model Selection

anyml tries multiple models and selects the best one via cross-validation:

Classification:

  • Logistic Regression
  • Random Forest
  • XGBoost (optional)
  • LightGBM (optional)

Regression:

  • Linear Regression
  • Random Forest
  • XGBoost (optional)
  • LightGBM (optional)

Evaluation Metrics

Classification: Accuracy, F1 (weighted), F1 (macro), full classification report

Regression: RMSE, MAE, R2

Comparison with Other AutoML Tools

Feature anyml auto-sklearn TPOT H2O
One-liner API Yes No No No
No config needed Yes Partial Partial No
Lightweight Yes No No No
Preprocessing included Yes Yes Yes Yes
Explainability Yes No No Partial
Pure Python Yes Yes Yes No (Java)

anyml is designed for simplicity. If you need extensive hyperparameter tuning or neural architecture search, consider auto-sklearn or TPOT. If you want to go from data to predictions in one line, anyml is for you.

API Reference

anyml.classify(df, target, models=None, cv=5, scoring=None)

Auto-classify a tabular dataset. Returns an AutoResult.

anyml.regress(df, target, models=None, cv=5, scoring=None)

Auto-regress a tabular dataset. Returns an AutoResult.

anyml.compare(df, target, task=None, models=None, cv=5, scoring=None)

Compare multiple models. Returns a dict of {model_name: cv_score}.

anyml.preprocess(df, target=None)

Standalone preprocessing. Returns processed DataFrame (or (X, y) tuple if target given).

AutoResult

Attribute / Method Description
.model Fitted sklearn estimator
.score Best mean CV score
.model_name Name of the winning model
.metrics Evaluation metrics dict
.feature_importances Feature importance dict
.all_scores CV scores for all models tried
.predict(df) Predict on new data
.explain() Feature importance DataFrame
.report() Human-readable report string
.save(path) Save to disk with joblib

Local-First / Edge AI

This package is designed to work completely offline. All model training and inference uses local libraries (scikit-learn, XGBoost, LightGBM). No internet connection or cloud APIs are required.

Development

git clone https://github.com/vietanhdev/anyml.git
cd anyml
pip install -e ".[dev]"
pytest tests/ -v

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anyml-0.2.0.tar.gz (29.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anyml-0.2.0-py3-none-any.whl (24.6 kB view details)

Uploaded Python 3

File details

Details for the file anyml-0.2.0.tar.gz.

File metadata

  • Download URL: anyml-0.2.0.tar.gz
  • Upload date:
  • Size: 29.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for anyml-0.2.0.tar.gz
Algorithm Hash digest
SHA256 46fcd51d1daee29ebfb4d3cb2b083d0d7bbfe2b2cc6a0cfd3c273e7f53c1566f
MD5 42c59320c18bf5505b69475553fe9968
BLAKE2b-256 574a874b85029d47ce8949ce52b9b5ed00b124c4378852532c0b0be4ac94a94a

See more details on using hashes here.

File details

Details for the file anyml-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: anyml-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 24.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for anyml-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 af4003f767ad4412d38d8bcd24c3587bc15b482d8e343b60e749b33150d529e9
MD5 815ceb4f19afd9b51ef7c814d5f6522b
BLAKE2b-256 97ec657b835e6923ef5035e5c40c2f428adbb2f1a8a2f3d9c0e532f3982009d3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page