Skip to main content

Simple AutoML wrapper for tabular data. One-liner API for classification and regression.

Project description

anyml

AutoML in one line — classify and regress instantly

PyPI Python License

Simple AutoML for tabular data. One-liner API for classification and regression with automatic preprocessing, model selection, and evaluation.

Runs completely offline. All processing uses local scikit-learn and XGBoost models. No cloud APIs or internet connection required.

Built by Viet-Anh Nguyen | nrl.ai

Installation

pip install anyml

With XGBoost support:

pip install "anyml[xgboost]"

With all optional dependencies:

pip install "anyml[full]"

Quick Start

Classification

import pandas as pd
import anyml

df = pd.read_csv("data.csv")

# Auto-preprocessing pipeline:
#   1. Detects column types (numeric, categorical, datetime)
#   2. Fills missing values (median for numeric, mode for categorical)
#   3. One-hot encodes categoricals, scales numerics
# Trains multiple models (LogisticRegression, RandomForest, XGBoost if installed)
# Selects best by stratified 5-fold cross-validation
# Returns AutoResult with .model, .score, .predict(), .explain()
result = anyml.classify(df, target="label")

print(result.score)         # Best cross-validation accuracy
print(result.model_name)    # e.g. "random_forest"
print(result.report())      # Full classification report

Regression

result = anyml.regress(df, target="price")

print(result.score)         # Best cross-validation RMSE (negative)
print(result.report())      # RMSE, MAE, R2

Predict on New Data

predictions = result.predict(new_df)

Feature Importance

result.explain()
# Returns a DataFrame with feature names and importance scores

Compare Models

scores = anyml.compare(df, target="label")
# {'logistic_regression': 0.92, 'random_forest': 0.95, ...}

Choose Specific Models

result = anyml.classify(df, target="label", models=["xgboost", "random_forest"])

Standalone Preprocessing

processed_df = anyml.preprocess(df)
# or with target separation:
X, y = anyml.preprocess(df, target="label")

Save and Load

result.save("model.joblib")
loaded = anyml.load_model("model.joblib")
loaded.predict(new_df)

How It Works

Automatic Preprocessing

anyml automatically detects column types and applies appropriate transformations:

Data Type Handling
Numeric Median imputation + StandardScaler
Categorical (low cardinality) Mode imputation + OneHotEncoding
Categorical (high cardinality) Mode imputation + OneHotEncoding
Datetime Extract year, month, day, day-of-week
Missing values Automatic imputation (median for numeric, mode for categorical)

Model Selection

anyml tries multiple models and selects the best one via cross-validation:

Classification:

  • Logistic Regression
  • Random Forest
  • XGBoost (optional)
  • LightGBM (optional)

Regression:

  • Linear Regression
  • Random Forest
  • XGBoost (optional)
  • LightGBM (optional)

Evaluation Metrics

Classification: Accuracy, F1 (weighted), F1 (macro), full classification report

Regression: RMSE, MAE, R2

Comparison with Other AutoML Tools

Feature anyml auto-sklearn TPOT H2O
One-liner API Yes No No No
No config needed Yes Partial Partial No
Lightweight Yes No No No
Preprocessing included Yes Yes Yes Yes
Explainability Yes No No Partial
Pure Python Yes Yes Yes No (Java)

anyml is designed for simplicity. If you need extensive hyperparameter tuning or neural architecture search, consider auto-sklearn or TPOT. If you want to go from data to predictions in one line, anyml is for you.

API Reference

anyml.classify(df, target, models=None, cv=5, scoring=None)

Auto-classify a tabular dataset. Returns an AutoResult.

anyml.regress(df, target, models=None, cv=5, scoring=None)

Auto-regress a tabular dataset. Returns an AutoResult.

anyml.compare(df, target, task=None, models=None, cv=5, scoring=None)

Compare multiple models. Returns a dict of {model_name: cv_score}.

anyml.preprocess(df, target=None)

Standalone preprocessing. Returns processed DataFrame (or (X, y) tuple if target given).

AutoResult

Attribute / Method Description
.model Fitted sklearn estimator
.score Best mean CV score
.model_name Name of the winning model
.metrics Evaluation metrics dict
.feature_importances Feature importance dict
.all_scores CV scores for all models tried
.predict(df) Predict on new data
.explain() Feature importance DataFrame
.report() Human-readable report string
.save(path) Save to disk with joblib

Local-First / Edge AI

This package is designed to work completely offline. All model training and inference uses local libraries (scikit-learn, XGBoost, LightGBM). No internet connection or cloud APIs are required.

Development

git clone https://github.com/vietanhdev/anyml.git
cd anyml
pip install -e ".[dev]"
pytest tests/ -v

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anyml-0.2.2.tar.gz (29.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anyml-0.2.2-py3-none-any.whl (24.7 kB view details)

Uploaded Python 3

File details

Details for the file anyml-0.2.2.tar.gz.

File metadata

  • Download URL: anyml-0.2.2.tar.gz
  • Upload date:
  • Size: 29.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for anyml-0.2.2.tar.gz
Algorithm Hash digest
SHA256 15c41776546279a1629f7fcdfb99b70170d90bc367d3eb837a4307cb96ab8e17
MD5 9f6a0bca92ec46f162d716e4b2616da3
BLAKE2b-256 cea1686061aebea3c36c2f28c192fd9e243aa62a51d97bd2453cd52f781ab332

See more details on using hashes here.

File details

Details for the file anyml-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: anyml-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 24.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for anyml-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3e38ea003e980af29e0be477cb0535e33788419bfcc9133b1c75e06e42d415e5
MD5 f455481b66088ca3b4683e6c60dd6016
BLAKE2b-256 52d46865b20ad4d9a978637f3e74b409dec00374d7dccf24c2c37b3fcd246756

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page