The missing middle layer between EDA and AutoML - deep data understanding meets model comparison

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

RB2004

These details have not been verified by PyPI

Project links

Documentation

Project description

DissectML

The missing middle layer between EDA and AutoML.

Deep data understanding meets model comparison -- the full journey from "What is my data?" to "Which model is best and WHY?", in as few as 3 function calls.

Quick Start | Features | Installation | Documentation | Contributing

DissectML HTML Report Preview

Why DissectML?

Most data science workflows look the same: run pandas-profiling for a quick summary, switch to scikit-learn for preprocessing, try a handful of models with PyCaret or LazyPredict, then stitch SHAP plots together in a notebook. By the time you have answers, you have imported 3-5 separate libraries, written hundreds of lines of glue code, and lost the thread that connects your data findings to your modelling decisions.

DissectML (dissectml) closes that gap. It is a single, unified pipeline that runs deep exploratory data analysis, pre-model intelligence checks (leakage detection, readiness scoring, algorithm recommendations), a multi-model battle arena, cross-model statistical comparison, and publication-ready HTML report generation -- all driven by a consistent API. Three function calls replace three notebooks.

Key Features

Exploratory Data Analysis

Unified correlation matrix -- Pearson, Cramer's V, and point-biserial correlation computed together and rendered in a single heatmap, regardless of column types.
Missing data intelligence -- Little's MCAR test plus MAR/MNAR classification, with automatic imputation strategy recommendations tailored to each column.
Statistical test battery -- Normality, independence, and variance tests auto-selected based on data type and sample size. No manual test selection required.
Auto cluster discovery -- K-Means and DBSCAN with automatically tuned parameters (elbow method, silhouette scoring) to surface natural groupings in your data.
Feature interaction and non-linearity detection -- Identifies non-linear relationships and interaction effects that linear models would miss.

Pre-Model Intelligence

Target leakage detection -- Four-pronged analysis covering correlation leakage, mutual information leakage, temporal leakage, and derived-feature leakage.
Data readiness score -- A 0-100 composite score with waterfall breakdown showing exactly what is dragging your data quality down (missing values, cardinality, class balance, outliers, and more).
Algorithm recommendations -- A rules engine that maps your EDA findings (data size, feature types, non-linearity, multicollinearity) to a ranked list of recommended model families.

Model Comparison

36-model battle arena -- 19 classifiers and 17 regressors (plus optional XGBoost, LightGBM, and CatBoost) trained and evaluated with parallel cross-validation in a single call.
Cross-model error analysis -- Identifies the hardest samples, builds a model complementarity matrix, and highlights where ensemble strategies could improve performance.
Statistical significance testing -- McNemar's test for classifiers and corrected repeated k-fold paired t-test for regressors, so you know which performance differences are real.

Reporting

Publication-ready HTML reports -- Interactive Plotly charts, narrative summaries, and structured sections covering every stage of the pipeline, exportable as a single self-contained HTML file.

Quick Start

import dissectml as dml

# Load a built-in dataset
df = dml.load_titanic()

1. Deep Exploratory Data Analysis

eda = dml.explore(df)

eda.overview.show()           # Shape, dtypes, memory usage
eda.correlations.heatmap()    # Unified correlation matrix
eda.missing.patterns()        # Missing data analysis with MCAR test
eda.outliers.plot()           # Outlier detection across numeric columns
eda.clusters.summary()        # Auto-discovered clusters

2. Model Battle Arena

models = dml.battle(df, target="survived")

models.leaderboard()          # Ranked models with CV scores
models.timing()               # Training time comparison

3. Full Pipeline (EDA + Intelligence + Battle + Compare + Report)

report = dml.analyze(df, target="survived", task="classification")

report.summary()              # High-level findings
report.export("report.html")  # Self-contained interactive report

The analyze function runs all five stages end-to-end: EDA, intelligence checks, model training, cross-model comparison, and report generation. For fine-grained control, call each stage individually.

Installation

Core package

pip install dissectml

Optional extras

pip install dissectml[boost]     # XGBoost, LightGBM, CatBoost
pip install dissectml[explain]   # SHAP explainability
pip install dissectml[report]    # PDF export (WeasyPrint + Kaleido)
pip install dissectml[scale]     # Polars backend + Optuna tuning
pip install dissectml[full]      # Everything above

Development

git clone https://github.com/rupeshbharambe24/dissectML.git
cd DissectML
pip install -e ".[dev]"

Requirements: Python 3.10 or later.

Comparison with Alternatives

Feature	DissectML	PyCaret	LazyPredict	YData Profiling
Deep EDA	Yes	--	--	Yes
Statistical Tests	Yes	--	--	Partial
Model Training	Yes	Yes	Yes	--
Model Comparison	Yes	Yes	Partial	--
SHAP Analysis	Yes	Yes	--	--
Interactive Reports	Yes	--	--	Yes
Target Leakage Detection	Yes	--	--	--
Data Readiness Score	Yes	--	--	--

DissectML is the only library that covers the full spectrum from statistical data profiling through model comparison with a single, coherent API. Other tools excel at individual stages but leave you to bridge the gaps yourself.

Architecture

DissectML is organized into five pipeline stages, each backed by a dedicated subpackage:

Stage 1: EDA            dissectml.eda           9 sub-modules (overview, correlations,
                                                missing, outliers, univariate, bivariate,
                                                clusters, interactions, statistical_tests)

Stage 2: Intelligence   dissectml.intelligence  Leakage detection, multicollinearity,
                                                feature importance, readiness scoring,
                                                algorithm recommendations

Stage 3: Battle         dissectml.battle        Model catalog, preprocessing pipeline,
                                                parallel CV runner, hyperparameter tuner

Stage 4: Compare        dissectml.compare       Metrics tables, significance tests,
                                                error analysis, Pareto frontiers,
                                                ROC/PR curves, SHAP comparison

Stage 5: Report         dissectml.report        Jinja2 HTML builder, narrative generator,
                                                section renderers, PDF export

Configuration

DissectML uses a global configuration object for controlling default behavior:

import dissectml as dml

# View current config
print(dml.get_config())

# Temporarily override settings
with dml.config_context(n_jobs=4, cv_folds=10):
    report = dml.analyze(df, target="price")

Built-in Datasets

Two datasets are bundled for quick experimentation:

df_titanic = dml.load_titanic()    # Binary classification (survival)
df_housing = dml.load_housing()    # Regression (house prices)

Documentation

Full documentation, API reference, and tutorials are available at:

https://dissectml.readthedocs.io

Contributing

Contributions are welcome. Please see CONTRIBUTING.md for guidelines on setting up a development environment, running the test suite, and submitting pull requests.

If you find a bug or have a feature request, please open an issue on the GitHub issue tracker.

License

DissectML is released under the MIT License.

Built by Rupesh Bharambe

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

RB2004

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

This version

0.1.2

Apr 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dissectml-0.1.2.tar.gz (154.5 kB view details)

Uploaded Apr 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dissectml-0.1.2-py3-none-any.whl (150.6 kB view details)

Uploaded Apr 7, 2026 Python 3

File details

Details for the file dissectml-0.1.2.tar.gz.

File metadata

Download URL: dissectml-0.1.2.tar.gz
Upload date: Apr 7, 2026
Size: 154.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dissectml-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`f5d37e5c4a6ab24d596b9179f40e506a4f9c9b8b3bb231766380fadac8ffc647`
MD5	`e5e1eed54ea75d88dd2edb4f13d15d9a`
BLAKE2b-256	`ed0d7c5ed678bb4c0b2c551bd31bf85d2a8ca152a547718f461d02e799029805`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dissectml-0.1.2.tar.gz:

Publisher: release.yml on rupeshbharambe24/dissectML

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dissectml-0.1.2.tar.gz
- Subject digest: f5d37e5c4a6ab24d596b9179f40e506a4f9c9b8b3bb231766380fadac8ffc647
- Sigstore transparency entry: 1246154286
- Sigstore integration time: Apr 7, 2026
Source repository:
- Permalink: rupeshbharambe24/dissectML@fdd4d59da5c282d1a505ca5c514dbc0e9287b189
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/rupeshbharambe24
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@fdd4d59da5c282d1a505ca5c514dbc0e9287b189
- Trigger Event: push

File details

Details for the file dissectml-0.1.2-py3-none-any.whl.

File metadata

Download URL: dissectml-0.1.2-py3-none-any.whl
Upload date: Apr 7, 2026
Size: 150.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dissectml-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d9ed0a60f68e91e1a7e1604914efa13aa9da34de42a9baf4b2f715ce2bacfd72`
MD5	`f61c48e17f18ff994e9a8f7b647aaff3`
BLAKE2b-256	`5a7a90f6edc5d55b4ec7f1582d8cc656754622376bf86b67489d2292019a4146`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dissectml-0.1.2-py3-none-any.whl:

Publisher: release.yml on rupeshbharambe24/dissectML

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dissectml-0.1.2-py3-none-any.whl
- Subject digest: d9ed0a60f68e91e1a7e1604914efa13aa9da34de42a9baf4b2f715ce2bacfd72
- Sigstore transparency entry: 1246154288
- Sigstore integration time: Apr 7, 2026
Source repository:
- Permalink: rupeshbharambe24/dissectML@fdd4d59da5c282d1a505ca5c514dbc0e9287b189
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/rupeshbharambe24
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@fdd4d59da5c282d1a505ca5c514dbc0e9287b189
- Trigger Event: push

dissectml 0.1.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

DissectML

Why DissectML?

Key Features

Exploratory Data Analysis

Pre-Model Intelligence

Model Comparison

Reporting

Quick Start

1. Deep Exploratory Data Analysis

2. Model Battle Arena

3. Full Pipeline (EDA + Intelligence + Battle + Compare + Report)

Installation

Core package

Optional extras

Development

Comparison with Alternatives

Architecture

Configuration

Built-in Datasets

Documentation

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance