DeepVariance Python AutoML SDK — LLM-driven pipelines for tabular ML and image classification

These details have not been verified by PyPI

Project links

Homepage

Project description

DeepVariance SDK

DeepVariance is a Python AutoML SDK that combines LLM-driven code generation with AutoGluon to automatically cast, clean, sample, preprocess, and train ML models on any tabular dataset — with a single pipeline.run() call.

How it works
Requirements
Installation
Configuration
Quickstart
Pipeline output
PipelineConfig reference
Progress callbacks
Build
Development
Documentation

How it works

The MLPipeline executes 7 sequential layers against your DataFrame:

#	Layer	Type	What it does
1	`AutoCastLayer`	LLM → code	Infers and applies column types, encodes categoricals
2	`DataProfilingLayer`	Deterministic	Computes feature + target statistics
3	`CorrelationLayer`	Deterministic	Pearson correlation matrix + mutual information scores
4	`SamplingLayer`	LLM → code	Produces a stratified, representative sample
5	`PreprocessingLayer`	LLM → code	Generates and applies pandas transforms (imputation, scaling, …)
6	`ModelRecommendationLayer`	LLM → recommendation	Selects the best AutoGluon model codes for your task
7	`ModelTrainingLayer`	Deterministic	Trains and evaluates a `TabularPredictor`, returns metrics

LLM-driven layers use a retry loop — if the generated code raises an exception, the error is fed back to the LLM for self-correction.

Requirements

Python ≥ 3.12
A DeepVariance API key — email founders@deepvariance.com or fill the contact form at deepvariance.com
An OpenAI or Groq API key

Installation

pip install deepvariance-sdk

Dependencies installed automatically: pandas, numpy, scipy, scikit-learn, psutil, openai, groq, autogluon.tabular, torch, torchvision

Dev install (from source)

git clone https://github.com/deepvariance/deepvariance-sdk
cd deepvariance-sdk
uv venv && source .venv/bin/activate   # Windows: .venv\Scripts\activate
uv pip install -e ".[dev]"             # installs all deps + pytest, ruff, cython

Configuration

The SDK reads credentials from environment variables. Set them in your shell before running:

export DV_API_KEY=your-deepvariance-api-key
export OPENAI_API_KEY=sk-...
export GROQ_API_KEY=gsk_...   # fallback if OpenAI key is absent

The SDK resolves LLM providers in order: OpenAI → Groq. You only need one.

Optional: load from a `.env` file (local dev)

python-dotenv is not required by the SDK, but it is a convenient way to manage keys during local development.

pip install python-dotenv

Create a .env file at the project root (see .env.example):

# .env
DV_API_KEY=dv_...
OPENAI_API_KEY=sk-...
GROQ_API_KEY=gsk_...

Then load it at the top of your script, before constructing PipelineConfig:

from dotenv import load_dotenv
load_dotenv()          # reads .env into os.environ

import os
from deepvariance.pipelines.ml import MLPipeline
from deepvariance.typings import PipelineConfig

config = PipelineConfig(
    dv_api_key=os.getenv("DV_API_KEY"),
    openai_api_key=os.getenv("OPENAI_API_KEY"),
)

Never commit your .env file. Add it to .gitignore:
.env
The .env.example file in the repo root shows all available environment variables.

Quickstart

import os
import pandas as pd

from deepvariance.pipelines.ml import MLPipeline
from deepvariance.typings import PipelineConfig

# 1. Load your data
data = pd.read_csv("your_dataset.csv")

# 2. Configure
config = PipelineConfig(
    dv_api_key=os.getenv("DV_API_KEY"),
    openai_api_key=os.getenv("OPENAI_API_KEY"),
    groq_api_key=os.getenv("GROQ_API_KEY"),
    sample_percentage=0.1,   # train on a 10% stratified sample
)

# 3. Run
pipeline = MLPipeline(config=config)
result = pipeline.run(data, target="your_target_column")

# 4. Inspect results
print(result["metrics"])
print(result["leaderboard"])

Run the bundled examples directly:

# Binary classification — Australia weather dataset
.venv/bin/python examples/ml_quickstart.py

# Regression — medical insurance dataset
.venv/bin/python examples/insurance_regression.py

Pipeline output

pipeline.run() returns a dict:

Key	Type	Description
`metrics`	`dict[str, float]`	Accuracy, F1, ROC-AUC, RMSE, R², … (task-dependent)
`model`	`TabularPredictor`	Trained AutoGluon predictor
`leaderboard`	`pd.DataFrame`	All candidate models ranked by validation score
`feature_importance`	`pd.DataFrame \| None`	Feature importance scores from the best model
`run_stats`	`dict`	Wall-clock duration and peak memory per layer

Classification metrics

accuracy, f1_macro, f1_weighted, precision_macro, precision_weighted, recall_macro, recall_weighted, cohen_kappa, mcc, roc_auc (binary) / roc_auc_ovr (multiclass), log_loss

Regression metrics

rmse, mae, r2, median_ae, max_error, explained_var, mape

PipelineConfig reference

@dataclass
class PipelineConfig:
    dv_api_key:    str | None = None   # DeepVariance API key (or set DV_API_KEY env var)
    openai_api_key: str | None = None  # OpenAI API key
    groq_api_key:   str | None = None  # Groq API key (fallback)
    sample_percentage: float | None = None  # e.g. 0.1 → 10% sample fed to AutoGluon
    extra: dict[str, Any] = field(default_factory=dict)  # pipeline-specific overrides

sample_percentage controls the fraction of rows passed to AutoGluon after the LLM sampling stage. For large datasets (> 100k rows) a value of 0.1–0.2 keeps training fast while preserving distribution.

Progress callbacks

Pass an on_progress callable to get real-time stage updates:

def on_progress(stage: str, status: str) -> None:
    # stage  — e.g. "AutoCastLayer", "ModelTrainingLayer"
    # status — "start" | "complete" | "error"
    icon = {"start": "▶", "complete": "✓", "error": "✗"}.get(status, "·")
    print(f"  {icon}  {stage}: {status}")

result = pipeline.run(data, target="label", on_progress=on_progress)

Build

The release wheel compiles all source to native C extensions via Cython — no Python source is included in the distributed package.

# Install build dependencies (one-time)
uv pip install -e ".[dev]"

# Compile extensions in-place (for local dev / running tests against .so)
just build-ext

# Build a release wheel (compiled .so only, no .py source)
just build-wheel
# → dist/deepvariance_sdk-1.0.0-cp312-cp312-macosx_10_9_universal2.whl

For CI, build on each target platform (macOS arm64, Linux x86_64) and upload all wheels to PyPI so users get the right binary for their machine.

Documentation

The project now includes Sphinx-based documentation under the docs/ directory. To build the HTML locally:

# install docs dependencies (optional group)
uv pip install -e ".[docs]"    # or use pip/poetry/uv manually
cd docs
make html             # requires make; or run `sphinx-build -b html . _build/html`

The generated site will appear in docs/_build/html/index.html.

See docs/quickstart.rst for a getting‑started guide and docs/api.rst for an auto‑generated API reference.

Development

# Run tests
.venv/bin/python -m pytest tests/ -q

# Lint
.venv/bin/ruff check src/ tests/

# Format
.venv/bin/ruff format src/ tests/

All lint rules are configured in pyproject.toml under [tool.ruff].

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.1

Mar 3, 2026

1.0.0

Mar 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

deepvariance_sdk-1.0.1-cp312-cp312-macosx_10_9_universal2.whl (5.3 MB view details)

Uploaded Mar 3, 2026 CPython 3.12macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file deepvariance_sdk-1.0.1-cp312-cp312-macosx_10_9_universal2.whl.

File metadata

Download URL: deepvariance_sdk-1.0.1-cp312-cp312-macosx_10_9_universal2.whl
Upload date: Mar 3, 2026
Size: 5.3 MB
Tags: CPython 3.12, macOS 10.9+ universal2 (ARM64, x86-64)
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for deepvariance_sdk-1.0.1-cp312-cp312-macosx_10_9_universal2.whl
Algorithm	Hash digest
SHA256	`03a7b5c616a8cbe23f035767e62908909c32b9f2e709779d599d720e1791dbad`
MD5	`4bf874e075037699c0d5bc7edf0e91ea`
BLAKE2b-256	`e6d8b84022ac06d1e612395d653b9b67d419b67126172aa4db1f620ac8babedd`

See more details on using hashes here.

deepvariance-sdk 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DeepVariance SDK

Table of Contents

How it works

Requirements

Installation

Dev install (from source)

Configuration

Optional: load from a `.env` file (local dev)

Quickstart

Pipeline output

Classification metrics

Regression metrics

PipelineConfig reference

Progress callbacks

Build

Documentation

Development

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

deepvariance-sdk 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DeepVariance SDK

Table of Contents

How it works

Requirements

Installation

Dev install (from source)

Configuration

Optional: load from a .env file (local dev)

Quickstart

Pipeline output

Classification metrics

Regression metrics

PipelineConfig reference

Progress callbacks

Build

Documentation

Development

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

Optional: load from a `.env` file (local dev)