Post-collection data science lifecycle toolkit — from raw DataFrame to model card

These details have not been verified by PyPI

Project description

ds-toolkit

Post-collection data science lifecycle toolkit.
From raw DataFrame to evaluated, tracked, and reported model — in composable, Jupyter-native Python.

Python versions MIT License Tests sklearn compatible

What is ds-toolkit?

ds-toolkit is an opinionated, production-ready library that wraps the messy middle of data science work — everything after you have data and before you have a deployed model. It gives you:

One-call profiling and validation before you touch a single row
CV-safe preprocessing that cannot leak across fold boundaries by design
Auto-selecting encoders and scalers that make sensible choices without configuration
Multi-model CV harness that ranks every estimator in one call
Optuna-powered tuning with pre-built search spaces for every major model
SHAP explainability that auto-picks TreeExplainer or KernelExplainer
MLflow experiment tracking as a context manager — zero boilerplate
Model cards generated from your result objects in two lines

Every module is sklearn-compatible (fit / transform / fit_transform), returns typed result objects with a .display() method that renders inline in Jupyter, and mutates nothing.

Architecture

ds_toolkit/
├── core/        # Stage 1–2: profiling, validation, cleaning
├── features/    # Stage 3: encoding, engineering, selection
├── models/      # Stage 4: registry, CV, tuning, ensembles
├── eval/        # Stage 5: metrics, SHAP, plots, error analysis
├── infra/       # Stage 6: experiment logging, config, serialisation
└── reporting/   # Stage 7: notebook output, HTML export, model cards

Installation

Core (no optional deps):

pip install dstoolkit-adnan

With boosting libraries:

pip install "dstoolkit-adnan[boosting]"      # XGBoost + LightGBM + CatBoost

With tuning + tracking:

pip install "dstoolkit-adnan[tune,track]"    # Optuna + MLflow

With SHAP explanations:

pip install "dstoolkit-adnan[explain]"       # shap

Everything:

pip install "dstoolkit-adnan[all]"

Development install (editable):

git clone https://github.com/ShadowGodd1/ds-toolkit.git
cd ds-toolkit
pip install -e ".[dev]"

Quick Start — Full Pipeline

import pandas as pd
from ds_toolkit.core import DataProfiler, SchemaValidator, MissingHandler, OutlierDetector, TypeCaster
from ds_toolkit.features import EncoderFactory, DatetimeDecomposer, FeatureSelector, Scaler
from ds_toolkit.models import ModelRegistry, CVHarness, TunerOptuna
from ds_toolkit.eval import MetricsReport, ExplainerSHAP, DiagnosticPlotter, ErrorAnalyser
from ds_toolkit.infra import ExperimentLogger, ConfigManager, PipelineSerialiser
from ds_toolkit.reporting import NotebookReporter, generate_model_card

df = pd.read_csv("data/my_dataset.csv")
target_col = "label"

# ── Stage 1: Understand ──────────────────────────────────────────────────
profile = DataProfiler().profile(df)
profile.display()                          # renders inline in Jupyter

schema = {
    "age":    {"nullable": False, "min": 0, "max": 120},
    "email":  {"regex": r".+@.+\..+"},
}
validation = SchemaValidator().check(df, schema)
validation.display()

# ── Stage 2: Clean ───────────────────────────────────────────────────────
X = df.drop(columns=[target_col])
y = df[target_col]

X = TypeCaster().cast(X)
X, outlier_report = OutlierDetector(method="iqr", action="cap").detect(X)

handler = MissingHandler(strategy="median")
X = handler.fit_transform(X)

# ── Stage 3: Features ────────────────────────────────────────────────────
X = DatetimeDecomposer().decompose(X)

encoder = EncoderFactory(task="clf")
X = encoder.fit_transform(X, y)

scaler = Scaler(method="standard")
X = scaler.fit_transform(X)

selector = FeatureSelector(method="rfecv", task="clf")
X = selector.fit_transform(X, y)

# ── Stage 4: Train ───────────────────────────────────────────────────────
models = ModelRegistry.get(task="clf")

harness = CVHarness(task="clf", n_splits=5, scoring="roc_auc")
cv_results = harness.run(models, X, y)
cv_results.display()

best_name, best_model = cv_results.best_model

# Optional: tune the best model
tuner = TunerOptuna(task="clf", n_trials=100)
tune_result = tuner.tune(best_model, X, y)
best_model.set_params(**tune_result.best_params)
best_model.fit(X, y)

# ── Stage 5: Evaluate ────────────────────────────────────────────────────
y_pred  = best_model.predict(X)
y_proba = best_model.predict_proba(X)

metrics = MetricsReport(task="clf").report(y, y_pred, y_proba)
metrics.display()

shap_result = ExplainerSHAP(top_n=10).explain(best_model, X)
shap_result.display()

diag = DiagnosticPlotter().diagnostics(best_model, X, y)
diag.display()

errors = ErrorAnalyser(n_worst=0.1).analyse(best_model, X, y)
errors.display()

# ── Stage 6: Track ───────────────────────────────────────────────────────
logger = ExperimentLogger(tracking_uri="./mlruns")

with logger.run("my_experiment", params={"model": best_name}) as run:
    logger.log_metrics(metrics.metrics_df["value"].to_dict())
    logger.log_model(best_model, name=best_name)
    logger.log_shap(shap_result)

serialiser = PipelineSerialiser(output_dir="./models")
save_result = serialiser.save(best_model, name=best_name)

# ── Stage 7: Report ──────────────────────────────────────────────────────
NotebookReporter().display(cv_results, metrics, shap_result)

card = generate_model_card(
    best_model,
    cv_results=cv_results,
    eval_results=metrics,
    shap_result=shap_result,
    error_report=errors,
    experiment_info={"run_id": run.run_id},
)
card.display()
print(card.to_md())    # export as Markdown

Stage Reference

Stage 1 — Data Understanding & Validation

`DataProfiler`

One-call dataset summary: shape, dtypes, memory, missing%, cardinality, skew, kurtosis, outlier flag.

from ds_toolkit.core import DataProfiler

profiler = DataProfiler(
    cardinality_threshold=50,   # columns with ≤N unique values → categorical
    outlier_method="iqr",       # 'iqr' | 'zscore' | 'both'
    missing_threshold=0.05,     # warn if missing% exceeds this
)
result = profiler.profile(df)
result.display()                # Jupyter inline
result.summary_df               # pd.DataFrame — one row per column
result.warnings                 # list[str]

`SchemaValidator`

Pydantic-backed schema enforcement. Raises or returns a violations report.

from ds_toolkit.core import SchemaValidator

schema = {
    "age":    {"dtype": "numeric", "nullable": False, "min": 0, "max": 120},
    "email":  {"regex": r".+@.+\..+"},
    "status": {"allowed": ["active", "inactive"]},
    "id":     {"unique": True, "nullable": False},
}
result = SchemaValidator(strict=False).check(df, schema)
result.passed           # bool
result.violations_df    # pd.DataFrame — [column, check, detail]

`DistributionReport`

Auto-generates histograms, KDE plots, QQ plots, box plots, and correlation heatmap. Exports self-contained HTML.

from ds_toolkit.core import DistributionReport

result = DistributionReport().run(df, output_dir="reports/")
result.html_path        # Path to saved HTML
result.display()        # inline in Jupyter

Stage 2 — Data Cleaning & Preprocessing

`MissingHandler`

Per-column imputation — CV-safe (fit on train only).

from ds_toolkit.core import MissingHandler

handler = MissingHandler(
    strategy="median",                      # global fallback
    col_strategies={"city": "mode",         # per-column overrides
                    "note": "constant"},
    fill_values={"note": "unknown"},
    knn_neighbors=5,
)
X_train_clean = handler.fit_transform(X_train)
X_val_clean   = handler.transform(X_val)    # uses train statistics

Supported strategies: mean, median, mode, constant, knn, mice, none.

`OutlierDetector`

from ds_toolkit.core import OutlierDetector

detector = OutlierDetector(
    method="iqr",                           # 'iqr' | 'zscore' | 'isoforest' | 'lof'
    action="cap",                           # 'flag' | 'cap' | 'drop'
    col_actions={"revenue": "drop"},        # per-column action override
    iqr_factor=1.5,
)
result_df, report = detector.detect(df)

`TypeCaster`

from ds_toolkit.core import TypeCaster

caster = TypeCaster(
    cardinality_threshold=50,   # object cols with ≤N unique → category
    downcast_numerics=True,     # int64 → smallest safe int
    parse_dates=True,           # detect and parse date strings
)
df_typed = caster.cast(df)
caster.change_log               # list of {column, from, to}

`Deduplicator`

from ds_toolkit.core import Deduplicator

dedup = Deduplicator(
    keys=["patient_id", "visit_date"],  # exact dedup keys
    fuzzy_cols=["full_name"],           # fuzzy dedup columns (requires rapidfuzz)
    fuzzy_threshold=90,
)
df_clean = dedup.clean(df)
dedup.report()                          # pd.DataFrame — rows removed

Stage 3 — Feature Engineering

`EncoderFactory`

Auto-selects encoding by cardinality and task type.

Condition	Strategy
Column has ordered metadata	`OrdinalEncoder`
Cardinality ≤ `ohe_threshold` (default 15)	`OneHotEncoder`
Cardinality > threshold + target available	`TargetEncoder` (smoothed, CV-safe)
Cardinality > threshold + no target	`HashingEncoder`

from ds_toolkit.features import EncoderFactory

enc = EncoderFactory(
    task="clf",
    ohe_threshold=15,
    ordered_cols={"size": ["S", "M", "L", "XL"]},
)
X_train_enc = enc.fit_transform(X_train, y_train)
X_val_enc   = enc.transform(X_val)
enc.encoding_map                # dict: column → strategy used

`DatetimeDecomposer`

from ds_toolkit.features import DatetimeDecomposer

dt = DatetimeDecomposer(
    cols=["created_at"],        # None = auto-detect all datetime cols
    cyclical=True,              # add sin/cos encodings for month, dow, hour
    add_holidays=True,          # requires: pip install holidays
    country_code="KE",          # ISO country code for holiday calendar
)
df_expanded = dt.decompose(df)
# Adds: created_at_year, _month, _day, _day_of_week, _is_weekend,
#       _month_sin, _month_cos, _dow_sin, _dow_cos, ...

`InteractionBuilder`

from ds_toolkit.features import InteractionBuilder

builder = InteractionBuilder(
    cols=["age", "income", "score"],
    include_types=["product", "ratio"],  # 'polynomial' | 'product' | 'ratio'
    prune_interactions=True,             # drop near-zero-variance interactions
    top_k=20,                            # optional: RF-based top-k selection
)
X_train_int = builder.fit_transform(X_train, y_train)
X_val_int   = builder.transform(X_val)
builder.selected_features_              # list of surviving feature names

`FeatureSelector`

Multi-stage pipeline: variance → correlation → RFECV → SHAP (each stage toggleable).

from ds_toolkit.features import FeatureSelector

selector = FeatureSelector(
    method="rfecv",             # 'variance' | 'correlation' | 'rfecv' | 'shap'
    task="clf",
    correlation_threshold=0.95,
    cv_folds=5,
)
X_train_sel = selector.fit_transform(X_train, y_train)
X_val_sel   = selector.transform(X_val)
selector.selected_features_     # list of kept features
selector.report()               # pd.DataFrame — [feature, stage, reason]

`Scaler`

from ds_toolkit.features import Scaler

scaler = Scaler(
    method="standard",           # 'standard' | 'minmax' | 'robust'
    exclude_cols=["id", "flag"], # never scale these
)
X_train_sc = scaler.fit_transform(X_train)
X_val_sc   = scaler.transform(X_val)
scaler.scaling_stats_            # pd.DataFrame — center/scale per column

Stage 4 — Model Training & Selection

`ModelRegistry`

from ds_toolkit.models import ModelRegistry

models = ModelRegistry.get(task="clf")           # all available
models = ModelRegistry.get(task="clf",
    include=["lr", "rf", "xgboost"])             # only these
models = ModelRegistry.get(task="clf",
    exclude=["mlp"])                             # all except these

Built-in keys: lr, rf, gbm, et, mlp, xgboost, lightgbm, catboost

`CVHarness`

from ds_toolkit.models import CVHarness

harness = CVHarness(
    task="clf",
    n_splits=5,
    scoring="roc_auc",
    verbose=True,
)
cv_results = harness.run(models, X_train, y_train)
cv_results.summary_df           # ranked by mean_score
cv_results.best_model           # (name, fitted estimator)
cv_results.display()            # inline table in Jupyter

CV strategy is auto-selected:

Condition	Strategy
`task='clf'`, balanced	`StratifiedKFold(n_splits=5)`
`task='clf'`, imbalanced	`StratifiedKFold` + `class_weight='balanced'`
`task='reg'`	`KFold(n_splits=5, shuffle=True)`
`task='ts'`	`TimeSeriesSplit(n_splits=5)`

`TunerOptuna`

from ds_toolkit.models import TunerOptuna
from sklearn.ensemble import RandomForestClassifier

tuner = TunerOptuna(
    task="clf",
    n_trials=100,
    cv_folds=5,
    scoring="roc_auc",
)
result = tuner.tune(RandomForestClassifier(), X_train, y_train)
result.best_params               # dict — apply with model.set_params(**result.best_params)
result.best_score
result.study                     # optuna.Study for further analysis

Pre-built search spaces: LogisticRegression, Ridge, RandomForest, ExtraTrees, GradientBoosting, XGBoost, LightGBM.

`EnsembleBuilder`

from ds_toolkit.models import EnsembleBuilder

builder = EnsembleBuilder(
    task="clf",
    method="stack",              # 'stack' | 'vote' | 'blend'
    meta_learner="lr",           # 'lr' | 'ridge' | any sklearn estimator
    cv_folds=5,
)
ensemble = builder.build(models, X_train, y_train)
preds = ensemble.predict(X_val)
proba = ensemble.predict_proba(X_val)

Stage 5 — Evaluation & Diagnostics

`MetricsReport`

from ds_toolkit.eval import MetricsReport

result = MetricsReport(task="clf").report(y_true, y_pred, y_proba=y_proba)
result.metrics_df                # pd.DataFrame — metric → value
result.display()

Task	Primary Metrics	Secondary Metrics
Binary clf	ROC-AUC, F1, Precision, Recall	Log-loss, MCC, PR-AUC
Multi-class clf	Macro F1, Accuracy	Per-class P/R/F1
Regression	RMSE, MAE, R²	MAPE, Adj. R², Max error

`ExplainerSHAP`

from ds_toolkit.eval import ExplainerSHAP

result = ExplainerSHAP(top_n=10).explain(model, X)
result.display()                 # summary plot inline
result.values                    # raw SHAP values (n_samples × n_features)
result.figures                   # dict: 'summary', 'bar', 'dependence_<col>'

Auto-selects TreeExplainer for tree-based models, KernelExplainer for all others.

`DiagnosticPlotter`

from ds_toolkit.eval import DiagnosticPlotter

result = DiagnosticPlotter().diagnostics(model, X, y)
result.display()
result.figures                   # dict of matplotlib figures

Classification: confusion matrix (raw + normalised), ROC curve, PR curve, calibration plot
Regression: residuals vs fitted, Q-Q plot, scale-location, Cook's distance

`ErrorAnalyser`

from ds_toolkit.eval import ErrorAnalyser

result = ErrorAnalyser(n_worst=0.1).analyse(model, X, y)
result.segments_df               # feature distribution shift: worst vs rest
result.worst_df                  # the n_worst mis-predicted rows
result.display()

Stage 6 — Experiment Tracking & Reproducibility

`ExperimentLogger`

from ds_toolkit.infra import ExperimentLogger

logger = ExperimentLogger(tracking_uri="./mlruns")

with logger.run("my_experiment", params={"model": "rf", "n_estimators": 200}) as run:
    model.fit(X_train, y_train)
    logger.log_metrics({"roc_auc": 0.91, "f1": 0.87})
    logger.log_model(model, name="random_forest")
    logger.log_shap(shap_result)

print(run.run_id)
print(run.artifact_uri)

Auto-logged per run: params, metrics, model artifact, SHAP plot, requirements.txt snapshot, git commit hash.

`ConfigManager`

from ds_toolkit.infra import ConfigManager

# config/experiment.yaml:
# model:
#   n_estimators: 200
#   task: clf
# data:
#   target_col: ${TARGET_COL}   # resolved from env var

cfg = ConfigManager.load(
    "config/experiment.yaml",
    required=["data.target_col", "model.task"],
)
cfg.model.n_estimators           # 200
cfg.data.target_col              # value from $TARGET_COL

`PipelineSerialiser`

from ds_toolkit.infra import PipelineSerialiser

serial = PipelineSerialiser(output_dir="./models")

# Save with SHA-256 checksum + metadata sidecar
result = serial.save(
    pipeline,
    name="rf_v1",
    metadata={"roc_auc": 0.91, "trained_on": "2024-01-15"},
)
print(result.path)               # ./models/rf_v1_20240115_143022.pkl
print(result.checksum)           # SHA-256 hex

# Load — raises ChecksumError if file was tampered
model = serial.load(result.path)

Stage 7 — Reporting & Notebook Output

`NotebookReporter`

from ds_toolkit.reporting import NotebookReporter

NotebookReporter().display(
    cv_results=cv_results,
    eval_results=metrics,
    shap_result=shap_result,
    title="Patient Readmission Model — v1",
)

`HTMLExporter`

from ds_toolkit.reporting import HTMLExporter

result = HTMLExporter().export(
    output_path="reports/experiment_v1.html",
    cv_results=cv_results,
    eval_results=metrics,
    shap_result=shap_result,
    diagnostic_result=diag,
    title="Experiment Report",
)
# Self-contained HTML — no external deps, safe to email

`ModelCard`

from ds_toolkit.reporting import generate_model_card

card = generate_model_card(
    model,
    cv_results=cv_results,
    eval_results=metrics,
    shap_result=shap_result,
    error_report=errors,
    experiment_info={"run_id": run.run_id, "git_hash": "a1b2c3d"},
)
card.display()                   # inline in Jupyter
card.to_md()                     # Markdown string
card.to_html()                   # HTML string

Design Principles

No side effects. Every module accepts a DataFrame or model and returns a new object. Nothing is mutated in place.
CV-safety by default. Anything that touches the target (TargetEncoder, MissingHandler, Scaler, FeatureSelector) has a fit / transform split. Fit on train. Transform on val/test.
Jupyter-native. Every result object has a .display() method that renders rich HTML inline. Nothing requires a separate report step.
Stack-agnostic. XGBoost, LightGBM, CatBoost, and all sklearn estimators are first-class citizens across every stage.
Optional dependencies stay optional. shap, optuna, mlflow, rapidfuzz, and the boosting libraries are never imported at the top level. They are imported at call time and fail with a clear install message.

Running Tests

# All 209 tests
pytest

# Specific stage
pytest tests/test_core/
pytest tests/test_features/
pytest tests/test_models/
pytest tests/test_eval/

# With coverage
pytest --cov=ds_toolkit --cov-report=html

Contributing

Contributions are welcome. See CONTRIBUTING.md for guidelines.

Quick contribution flow:

git clone https://github.com/ShadowGodd1/ds-toolkit.git
cd ds-toolkit
pip install -e ".[dev]"
git checkout -b feature/my-feature
# make changes
pytest
git push origin feature/my-feature
# open a Pull Request

Changelog

See CHANGELOG.md.

License

MIT — see LICENSE.

Author

Adnan Mohamud
CEO & Founder, PataDoc — The Partner in Health in Your Hand
github.com/ShadowGodd1

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.3

May 16, 2026

1.0.2

May 15, 2026

1.0.1

May 15, 2026

1.0.0

May 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dstoolkit_adnan-1.0.3.tar.gz (70.2 kB view details)

Uploaded May 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dstoolkit_adnan-1.0.3-py3-none-any.whl (82.3 kB view details)

Uploaded May 16, 2026 Python 3

File details

Details for the file dstoolkit_adnan-1.0.3.tar.gz.

File metadata

Download URL: dstoolkit_adnan-1.0.3.tar.gz
Upload date: May 16, 2026
Size: 70.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dstoolkit_adnan-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`d89ff2c90f25f281491be29c8a2dcc0ffbd2c0aaa9ad0e059541ac75ec98bd1d`
MD5	`7747a63fc8fde5e64cae725bc1057a5d`
BLAKE2b-256	`63832740ffad2d7f8e59ed02086ea935080c8225822aa4164609fc254562fd76`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dstoolkit_adnan-1.0.3.tar.gz:

Publisher: publish.yml on ShadowGodd1/ds-toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dstoolkit_adnan-1.0.3.tar.gz
- Subject digest: d89ff2c90f25f281491be29c8a2dcc0ffbd2c0aaa9ad0e059541ac75ec98bd1d
- Sigstore transparency entry: 1553908942
- Sigstore integration time: May 16, 2026
Source repository:
- Permalink: ShadowGodd1/ds-toolkit@04dceab44cd5150588f548f7e3812794f633211b
- Branch / Tag: refs/tags/v1.0.3
- Owner: https://github.com/ShadowGodd1
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@04dceab44cd5150588f548f7e3812794f633211b
- Trigger Event: push

File details

Details for the file dstoolkit_adnan-1.0.3-py3-none-any.whl.

File metadata

Download URL: dstoolkit_adnan-1.0.3-py3-none-any.whl
Upload date: May 16, 2026
Size: 82.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dstoolkit_adnan-1.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e5b7f65579193e4037ada3f72dd599eeaaf17aa709f6d50cdf94efc447c5b948`
MD5	`4ca890d768b78feefa1badef090cb133`
BLAKE2b-256	`f1b96099c4cd4c05677107d90a8daf0e30af2234abf7794635f340a52de584bb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dstoolkit_adnan-1.0.3-py3-none-any.whl:

Publisher: publish.yml on ShadowGodd1/ds-toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dstoolkit_adnan-1.0.3-py3-none-any.whl
- Subject digest: e5b7f65579193e4037ada3f72dd599eeaaf17aa709f6d50cdf94efc447c5b948
- Sigstore transparency entry: 1553908948
- Sigstore integration time: May 16, 2026
Source repository:
- Permalink: ShadowGodd1/ds-toolkit@04dceab44cd5150588f548f7e3812794f633211b
- Branch / Tag: refs/tags/v1.0.3
- Owner: https://github.com/ShadowGodd1
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@04dceab44cd5150588f548f7e3812794f633211b
- Trigger Event: push

dstoolkit-adnan 1.0.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

ds-toolkit

What is ds-toolkit?

Architecture

Installation

Quick Start — Full Pipeline

Stage Reference

Stage 1 — Data Understanding & Validation

DataProfiler

SchemaValidator

DistributionReport

Stage 2 — Data Cleaning & Preprocessing

MissingHandler

OutlierDetector

TypeCaster

Deduplicator

Stage 3 — Feature Engineering

EncoderFactory

DatetimeDecomposer

InteractionBuilder

FeatureSelector

Scaler

Stage 4 — Model Training & Selection

ModelRegistry

CVHarness

TunerOptuna

EnsembleBuilder

Stage 5 — Evaluation & Diagnostics

MetricsReport

ExplainerSHAP

DiagnosticPlotter

ErrorAnalyser

Stage 6 — Experiment Tracking & Reproducibility

ExperimentLogger

ConfigManager

PipelineSerialiser

Stage 7 — Reporting & Notebook Output

NotebookReporter

HTMLExporter

ModelCard

Design Principles

Running Tests

Contributing

Changelog

License

Author

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`DataProfiler`

`SchemaValidator`

`DistributionReport`

`MissingHandler`

`OutlierDetector`

`TypeCaster`

`Deduplicator`

`EncoderFactory`

`DatetimeDecomposer`

`InteractionBuilder`

`FeatureSelector`

`Scaler`

`ModelRegistry`

`CVHarness`

`TunerOptuna`

`EnsembleBuilder`

`MetricsReport`

`ExplainerSHAP`

`DiagnosticPlotter`

`ErrorAnalyser`

`ExperimentLogger`

`ConfigManager`

`PipelineSerialiser`

`NotebookReporter`

`HTMLExporter`

`ModelCard`