Simple AutoML wrapper for tabular data. One-liner API for classification and regression.
Project description
anyml
AutoML in one line — classify and regress instantly
Simple AutoML for tabular data. One-liner API for classification and regression with automatic preprocessing, model selection, and evaluation.
Runs completely offline. All processing uses local scikit-learn and XGBoost models. No cloud APIs or internet connection required.
Built by Viet-Anh Nguyen | nrl.ai
Installation
pip install anyml
With XGBoost support:
pip install "anyml[xgboost]"
With all optional dependencies:
pip install "anyml[full]"
Quick Start
Classification
import pandas as pd
import anyml
df = pd.read_csv("data.csv")
# Auto-preprocessing pipeline:
# 1. Detects column types (numeric, categorical, datetime)
# 2. Fills missing values (median for numeric, mode for categorical)
# 3. One-hot encodes categoricals, scales numerics
# Trains multiple models (LogisticRegression, RandomForest, XGBoost if installed)
# Selects best by stratified 5-fold cross-validation
# Returns AutoResult with .model, .score, .predict(), .explain()
result = anyml.classify(df, target="label")
print(result.score) # Best cross-validation accuracy
print(result.model_name) # e.g. "random_forest"
print(result.report()) # Full classification report
Regression
result = anyml.regress(df, target="price")
print(result.score) # Best cross-validation RMSE (negative)
print(result.report()) # RMSE, MAE, R2
Predict on New Data
predictions = result.predict(new_df)
Feature Importance
result.explain()
# Returns a DataFrame with feature names and importance scores
Compare Models
scores = anyml.compare(df, target="label")
# {'logistic_regression': 0.92, 'random_forest': 0.95, ...}
Choose Specific Models
result = anyml.classify(df, target="label", models=["xgboost", "random_forest"])
Standalone Preprocessing
processed_df = anyml.preprocess(df)
# or with target separation:
X, y = anyml.preprocess(df, target="label")
Save and Load
result.save("model.joblib")
loaded = anyml.load_model("model.joblib")
loaded.predict(new_df)
How It Works
Automatic Preprocessing
anyml automatically detects column types and applies appropriate transformations:
| Data Type | Handling |
|---|---|
| Numeric | Median imputation + StandardScaler |
| Categorical (low cardinality) | Mode imputation + OneHotEncoding |
| Categorical (high cardinality) | Mode imputation + OneHotEncoding |
| Datetime | Extract year, month, day, day-of-week |
| Missing values | Automatic imputation (median for numeric, mode for categorical) |
Model Selection
anyml tries multiple models and selects the best one via cross-validation:
Classification:
- Logistic Regression
- Random Forest
- XGBoost (optional)
- LightGBM (optional)
Regression:
- Linear Regression
- Random Forest
- XGBoost (optional)
- LightGBM (optional)
Evaluation Metrics
Classification: Accuracy, F1 (weighted), F1 (macro), full classification report
Regression: RMSE, MAE, R2
Comparison with Other AutoML Tools
| Feature | anyml | auto-sklearn | TPOT | H2O |
|---|---|---|---|---|
| One-liner API | Yes | No | No | No |
| No config needed | Yes | Partial | Partial | No |
| Lightweight | Yes | No | No | No |
| Preprocessing included | Yes | Yes | Yes | Yes |
| Explainability | Yes | No | No | Partial |
| Pure Python | Yes | Yes | Yes | No (Java) |
anyml is designed for simplicity. If you need extensive hyperparameter tuning or neural architecture search, consider auto-sklearn or TPOT. If you want to go from data to predictions in one line, anyml is for you.
API Reference
anyml.classify(df, target, models=None, cv=5, scoring=None)
Auto-classify a tabular dataset. Returns an AutoResult.
anyml.regress(df, target, models=None, cv=5, scoring=None)
Auto-regress a tabular dataset. Returns an AutoResult.
anyml.compare(df, target, task=None, models=None, cv=5, scoring=None)
Compare multiple models. Returns a dict of {model_name: cv_score}.
anyml.preprocess(df, target=None)
Standalone preprocessing. Returns processed DataFrame (or (X, y) tuple if target given).
AutoResult
| Attribute / Method | Description |
|---|---|
.model |
Fitted sklearn estimator |
.score |
Best mean CV score |
.model_name |
Name of the winning model |
.metrics |
Evaluation metrics dict |
.feature_importances |
Feature importance dict |
.all_scores |
CV scores for all models tried |
.predict(df) |
Predict on new data |
.explain() |
Feature importance DataFrame |
.report() |
Human-readable report string |
.save(path) |
Save to disk with joblib |
Local-First / Edge AI
This package is designed to work completely offline. All model training and inference uses local libraries (scikit-learn, XGBoost, LightGBM). No internet connection or cloud APIs are required.
Development
git clone https://github.com/vietanhdev/anyml.git
cd anyml
pip install -e ".[dev]"
pytest tests/ -v
License
MIT License. See LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file anyml-0.2.2.tar.gz.
File metadata
- Download URL: anyml-0.2.2.tar.gz
- Upload date:
- Size: 29.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15c41776546279a1629f7fcdfb99b70170d90bc367d3eb837a4307cb96ab8e17
|
|
| MD5 |
9f6a0bca92ec46f162d716e4b2616da3
|
|
| BLAKE2b-256 |
cea1686061aebea3c36c2f28c192fd9e243aa62a51d97bd2453cd52f781ab332
|
File details
Details for the file anyml-0.2.2-py3-none-any.whl.
File metadata
- Download URL: anyml-0.2.2-py3-none-any.whl
- Upload date:
- Size: 24.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e38ea003e980af29e0be477cb0535e33788419bfcc9133b1c75e06e42d415e5
|
|
| MD5 |
f455481b66088ca3b4683e6c60dd6016
|
|
| BLAKE2b-256 |
52d46865b20ad4d9a978637f3e74b409dec00374d7dccf24c2c37b3fcd246756
|