Skip to main content

A lightweight AutoML and experiment tracking library with FastAPI backend and Python SDK. Works locally with MongoDB.

Project description

swiftpredict-v2

Python FastAPI MongoDB scikit-learn XGBoost LightGBM License PyPI Status

A fully local, zero-cloud AutoML and experiment tracking library. One class, five lines of code, a complete machine learning pipeline.


What swiftpredict-v2 Does Differently

Most AutoML libraries are built around the assumption that complexity is acceptable if results are good. You configure pipelines, manage preprocessors, tune encoders, handle class imbalance, split data, scale features, select models, cross-validate, compare results, and log everything yourself. That is hundreds of lines of boilerplate per experiment, repeated every time.

swiftpredict-v2 collapses all of that into a single fit() call.

It does not send your data anywhere. There is no API key, no cloud account, no rate limit, and no subscription. Everything runs on your machine, tracked in your local MongoDB instance, viewable in a local web UI that ships as a single HTML file with no build step.

The design philosophy is that a library should remove friction from the actual work, which is understanding your data and iterating on models. swiftpredict-v2 handles everything between loading a CSV and having trained, evaluated, production-ready models so that you can focus on what actually matters.


Features

Automatic preprocessing pipeline

Null handling uses statistical heuristics: rows are dropped when missingness is under 10%, otherwise numeric columns are filled with mean or mode depending on normality test results, categorical columns use mode, and datetime columns use interpolation. No configuration required.

Intelligent categorical encoding

Categorical columns with five or fewer unique values are one-hot encoded. High-cardinality columns are processed with spaCy lemmatization and stopword removal, then vectorized with TF-IDF and reduced with TruncatedSVD to the minimum number of components that explain 95% of variance. The fitted encoders are stored as attributes on the AutoML instance for reuse at inference time.

Automatic task detection

The target column is inspected at runtime. String and category types map to classification. Integer targets with 20 or fewer unique values map to classification. Float targets and high-cardinality integers map to regression. No parameter needed.

Class imbalance handling

For classification tasks, the minority-to-majority class ratio is computed. If it falls below 0.15, SMOTE is applied automatically before training. The original DataFrame is preserved; resampling happens only on the training split.

Multi-model training with cross-validation

For classification: GaussianNB, XGBClassifier, RandomForestClassifier, LGBMClassifier, LogisticRegression. For regression: LinearRegression, XGBRegressor, LGBMRegressor, RandomForestRegressor. All models are trained and evaluated with 5-fold cross-validation. The best model per metric and the overall best model by majority vote are stored and returned.

Experiment tracking via SwiftPredict SDK

Every training run is automatically logged to a local MongoDB collection. Parameters, metrics, model names, run IDs, timestamps, tags, notes, and status are all persisted. The SDK can also be used independently of AutoML for tracking DL experiments epoch by epoch.

Local web UI

A single index.html file with no dependencies, no npm, no build step. Launch it with one CLI command. View all ML and DL projects, inspect run details and metrics, filter by status, add tags and notes, update run status, and delete runs or entire projects.


Prerequisites

  • Python 3.10 or higher
  • MongoDB Community Edition installed and running locally on the default port (27017)

MongoDB is required for experiment tracking. Install it from mongodb.com/try/download/community and ensure the service is running before using the SDK or launching the UI.

If MongoDB is running at a non-default URI, set the environment variable before running:

export MONGO_URI="mongodb://your-host:27017"

Installation

pip install swiftpredict-v2

AutoML Usage

Minimal example

from swiftpredict import AutoML

model = AutoML()
results = model.fit(
    project_name="churn-prediction",
    file_path="data/churn.csv",
    target_column="churned"
)

print(results)
# {
#   "accuracy": "RandomForestClassifier",
#   "f1": "XGBClassifier",
#   "precision": "XGBClassifier",
#   "overall": ["XGBClassifier"]
# }

That single fit() call handles null imputation, boolean and categorical encoding, text vectorization, correlation-based feature removal, stratified train-test splitting, feature scaling, class imbalance correction, multi-model training with cross-validation, and experiment logging. Everything that would otherwise take 150 to 300 lines of code depending on the dataset.

Evaluating model performance

# Evaluate the overall best model on the held-out test set
metrics = model.evaluate_performance(key="overall")
print(metrics)
# {"accuracy": 0.94, "f1": 0.93, "roc_auc": 0.97, "precision": 0.94}

# Or evaluate a specific metric's best model
metrics = model.evaluate_performance(key="f1")

# Or evaluate an external model using the same test split
from sklearn.linear_model import LogisticRegression
external = LogisticRegression().fit(model.X_test, model.y_test)
metrics = model.evaluate_performance(model=external)

Exporting a model

# Export the overall best model
model.export_model(model_path="models/best_model.pkl")

# Export the best model for a specific metric
model.export_model(model_path="models/best_f1.pkl", key="f1")

Accessing intermediate pipeline state

The AutoML instance retains all fitted preprocessors after training. You can access them directly instead of re-running preprocessing at inference time.

# The preprocessed DataFrame used for training
model.modified_df

# The fitted StandardScaler
model.std_scaler

# List of (column_index, fitted_OneHotEncoder) tuples
model.ohe_lst

# List of (column_index, fitted_TfidfVectorizer, fitted_TruncatedSVD) tuples
model.vectorizer_lst

# Columns removed during preprocessing (by index)
model.removed_columns

# The held-out test features (already scaled)
model.X_test

# The held-out test labels
model.y_test

# Detected task type: "classification" or "regression"
model.task

This means you do not need to refit any preprocessor when running inference on new data. Load the AutoML instance or individual components and transform directly.

Optional fit parameters

model.fit(
    project_name="price-regression",
    file_path="data/houses.csv",
    target_column="sale_price",
    drop_id=True,     # Drop columns whose name contains "id" or "index". Default: True
    drop_name=True    # Drop columns whose name is exactly "name". Default: True
)

Using the SwiftPredict SDK Independently

The SwiftPredict class can be used on its own for any experiment, not just AutoML. It is particularly useful for deep learning projects where you want to log metrics per epoch.

ML project (single metric value per model)

from swiftpredict import SwiftPredict

logger = SwiftPredict(project_name="sentiment-analysis", project_type="ML")

logger.log_params({"C": 1.0, "solver": "lbfgs"}, model_name="LogisticRegression")
logger.log_or_update_metric(key="accuracy", value=0.91, model_name="LogisticRegression")
logger.log_or_update_metric(key="f1_score", value=0.89, model_name="LogisticRegression")
logger.finalize_run(status="completed", notes="Baseline run", tags=["baseline", "v1"])

DL project (metric value per epoch)

logger = SwiftPredict(project_name="image-classifier", project_type="DL")

for epoch, (train_loss, val_acc) in enumerate(training_loop()):
    logger.log_or_update_metric(key="loss", value=train_loss, model_name="ResNet18", step=epoch)
    logger.log_or_update_metric(key="val_accuracy", value=val_acc, model_name="ResNet18", step=epoch)

logger.finalize_run(status="completed", tags=["resnet", "imagenet"])

Retrieving runs

runs = logger.find_project_runs()
for run in runs:
    print(run["run_id"], run["metrics"])

Standalone Preprocessing Utilities

All preprocessing functions used internally by AutoML are also exported at the top level for use in custom pipelines.

from swiftpredict import (
    handle_null_values,
    handle_imbalance,
    handle_cat_columns,
    detect_task,
    get_dtype_columns,
    text_preprocessor,
)

# Detect column types
col_types = get_dtype_columns(df)
# {"categorical": [...], "numeric": [...], "date": [...], "bool": [...]}

# Detect ML task from target column
task = detect_task(df, y="target")  # "classification" or "regression"

# Handle nulls
clean_df = handle_null_values(df)

# Encode categorical columns
encoded_df, ohe_encoders, tfidf_encoders = handle_cat_columns(df, cat_columns=["category", "description"])

# Fix class imbalance
X_resampled, y_resampled = handle_imbalance(df, target_column="label", X_train=X, y_train=y)

# Preprocess a text string
clean_text = text_preprocessor("The quick brown fox jumps!", handle_html=False)

Launching the UI

The web UI allows you to view all logged experiments, inspect run details, filter by status, add notes and tags, and delete runs, all from a browser with no extra setup.

Start the backend and open the UI:

swiftpredict launch ui

This command starts the FastAPI backend on http://localhost:8000 and opens index.html automatically in your default browser.

What you can do in the UI:

  • View all ML and DL projects with their run IDs and model names
  • Click into any run to see full details including metrics logged
  • Filter all projects by status (completed, running, failed, pending)
  • Log parameters, add tags, update status, and add notes to any run
  • Delete individual runs or entire projects

The UI communicates directly with your local FastAPI backend. MongoDB must be running for any data to appear.


Project Structure

swiftpredict-v2/
├── swiftpredict/
│   ├── __init__.py          # Public API exports
│   ├── cli.py               # CLI entry point
│   └── index.html           # Standalone web UI (ships with the package)
├── backend/
│   └── app/
│       ├── __init__.py
│       ├── api/
│       │   └── logger_apis.py     # FastAPI routes
│       ├── client/
│       │   └── swift_predict.py   # Experiment tracking SDK
│       ├── core/
│       │   └── config.py          # MongoDB schema and setup
│       └── services/
│           ├── automl_trainer.py  # AutoML class
│           └── preprocessing.py   # Full preprocessing pipeline
├── pyproject.toml
├── README.md
└── LICENSE

API Reference

The FastAPI backend exposes the following endpoints. All are accessible at http://localhost:8000 when the backend is running.

Method Endpoint Description
GET / Health check
GET /projects/ml All ML project runs
GET /projects/dl All DL project runs
GET /projects/{status} Runs filtered by status
GET /{project}/runs/{run_id} Details for a specific run
GET /{project}/plots/available_metrics Metrics logged for a project
GET /{project}/plots/{metric} Plot image for a DL metric (PNG stream)
POST /{project}/runs/{run_id}/log_param Log a parameter to a run
POST /{project}/runs/{run_id}/add_tags Add tags to a run
POST /{project}/runs/{run_id}/update_status Update run status
POST /{project}/runs/{run_id}/add_notes Add notes to a run
DELETE /projects/delete Delete a run or entire project
DELETE /delete_all Delete all data

Interactive documentation is available at http://localhost:8000/docs when the backend is running.


Environment Variables

Variable Default Description
MONGO_URI mongodb://localhost:27017 MongoDB connection string

Contributing

Contributions are welcome. Fork the repository, create a branch, make your changes, and open a pull request with a clear description of what was changed and why.

Areas where contributions are particularly useful: additional model types, hyperparameter tuning strategies, time-series support, and UI improvements.


Author

Manas Ranjan Jena GitHub: @ManasRanjanJena253 Email: mranjanjena253@gmail.com LinkedIn: manasranjanjena253


License

MIT License. Free to use, modify, and distribute with attribution.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swiftpredict_v2-0.2.1.tar.gz (33.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swiftpredict_v2-0.2.1-py3-none-any.whl (30.8 kB view details)

Uploaded Python 3

File details

Details for the file swiftpredict_v2-0.2.1.tar.gz.

File metadata

  • Download URL: swiftpredict_v2-0.2.1.tar.gz
  • Upload date:
  • Size: 33.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for swiftpredict_v2-0.2.1.tar.gz
Algorithm Hash digest
SHA256 3cb0a913ff95de513f2191157608515671e3c2f8c5ddb3c8aff8e3e9f78dee8a
MD5 2fd583f5bb66641ea7ca6a27bdb1b624
BLAKE2b-256 5e77af9aa6163f050a0c87c0aae5c45b3aaf1adcc472fb9a8ab19f71c184dd8d

See more details on using hashes here.

File details

Details for the file swiftpredict_v2-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for swiftpredict_v2-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 144e78f6345485ba677d3b66a915a3f20486304105b1488c721684b146f57fec
MD5 21fe3ef8c1b22a17040ff0c55263b088
BLAKE2b-256 b6e6d99cd8e976a0d6fc56f857cad33865e505a008c0fd0399fe9e2a63b04aee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page