Skip to main content

A lightweight, sklearn-compatible Python toolkit for tabular machine learning.

Project description

clinikit

Prepared by Berat Kaan SEVEN

A lightweight, sklearn-compatible Python toolkit for tabular machine learning. clinikit bundles 14 hybrid classifiers, 5 experiment protocols, calibration utilities, label-noise diagnostics, fairness audits, and structured HTML reports behind a single drop-in package.

Research and development use only. This is an integration toolkit, not a regulated product and not a research paper of original methods. See CITATIONS.md for source-method references.

CI codecov PyPI version Python versions License: MIT Documentation Status


Why clinikit

clinikit is a complement to existing libraries, not a competitor.

Library Focus Why clinikit is different
scikit-learn General-purpose ML Adds curated experiment protocols, audit utilities, and structured reporting
Cleanlab Label noise only Integrates Cleanlab plus neighborhood conflict and LOO into one diagnostics module
MAPIE Conformal prediction only Includes selective classification as one of 14 bundled models
Fairlearn / AIF360 Fairness only The audit module bundles fairness, leakage, and documentation helpers
AutoGluon AutoML Library-first; thin AutoML wrappers exist but no auto-magic by default
PyHealth Deep learning for sequence / multimodal Tabular-only, classical ML focused, lightweight

Installation

pip install clinikit

Optional dependency groups:

pip install "clinikit[diagnostics]"   # Cleanlab-based label-noise tools
pip install "clinikit[explain]"       # SHAP and LIME wrappers
pip install "clinikit[automl]"        # TabPFN, FLAML, AutoGluon wrappers
pip install "clinikit[synthetic]"     # CTGAN / TVAE wrappers
pip install "clinikit[conformal]"     # MAPIE conformal prediction
pip install "clinikit[all]"           # Everything

Supported Python versions: 3.10, 3.11, 3.12, 3.13.


Quickstart

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

from clinikit.datasets import load_pima
from clinikit.metrics import sensitivity, specificity
from clinikit.models import RuleAugmentedClassifier

X, y = load_pima(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

model = RuleAugmentedClassifier(base_estimator=LogisticRegression(max_iter=1000))
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print("Sensitivity:", sensitivity(y_test, y_pred))
print("Specificity:", specificity(y_test, y_pred))

For a complete walkthrough, see examples/quickstart.ipynb or open it in Colab.


What is in the box

14 hybrid classifiers (clinikit.models)

All sklearn-compatible, all pass sklearn.utils.estimator_checks.check_estimator.

  • RuleAugmentedClassifier
  • BoundaryRefineClassifier
  • SubgroupThresholdClassifier
  • ErrorAwareCalibrator
  • MonotonicBooster
  • HardSampleWeightedEnsemble
  • ClassConditionalImputer
  • CrossDistributionDistiller
  • SelectiveClassifier
  • InstanceAdaptiveThreshold
  • DialecticalEnsemble
  • LatentSubtypeRouter
  • IterativeLabelRefiner
  • DualViewCoTrainer

Supporting modules

  • preprocessing — imputers, scalers, outlier flags, missing indicators
  • metrics — sensitivity, specificity, NPV, PPV, F2, MCC, Brier, ECE
  • curves — ROC, PR, calibration, Decision Curve Analysis
  • protocols — 5 experiment protocols (Defensible, MaxScore, OriginalOnly, Deployment, Audit)
  • leaderboard — experiment tracking CSV with 38 columns
  • report — HTML structured report generator (Jinja2 templates)
  • audit — leakage detection, subgroup fairness, documentation checks
  • governance — audit-trail manifest templates (documentation only)
  • reproducibility — manifest files (data hash + config + library versions)
  • datasets — UCI benchmarks (PIMA, Wisconsin, UCI Heart, Frankfurt)
  • cli — Typer-based CLI: train, benchmark, audit, validate, report
  • thresholds, calibration, statistics, diagnostics, cost_sensitive, monitor, modelcard, cross_val, explainability, automl, external_val, time_split, active_learning, synthetic

Command-line interface

clinikit train      --config config.yaml
clinikit benchmark  --dataset pima --models all
clinikit audit      --data data.csv --report audit.html
clinikit validate   --model model.joblib --data data.csv
clinikit report     --leaderboard runs.csv --out report.html

Project notes

clinikit is an integration toolkit. The methods it bundles are adaptations of techniques published in the academic literature; see CITATIONS.md for source-method references. It is not a research paper of original methods, and it is not a regulated product. Research and development use only.


Contributing

Contributions are welcome. Please read CONTRIBUTING.md for the development workflow, coding standards, and pull-request process. By participating, you agree to abide by the Code of Conduct.


Citation

If you use clinikit in academic work, please cite it via the CITATION.cff file, or use:

@software{clinikit,
  author  = {SEVEN, Berat Kaan},
  title   = {clinikit: a tabular machine-learning toolkit},
  year    = {2026},
  url     = {https://github.com/clinikit/clinikit},
  version = {0.1.0}
}

License

Distributed under the MIT License. See LICENSE for the full text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clinikit-0.1.0.tar.gz (194.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clinikit-0.1.0-py3-none-any.whl (200.8 kB view details)

Uploaded Python 3

File details

Details for the file clinikit-0.1.0.tar.gz.

File metadata

  • Download URL: clinikit-0.1.0.tar.gz
  • Upload date:
  • Size: 194.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for clinikit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 30cef3cca23e26c2490672cb079e40e442cf212fc533b1a62e51f5a83e996922
MD5 7d268349f2f4ead8c17445a29ee8b367
BLAKE2b-256 5243548844767335e8ad58c5e4af0666d7b599d9d5ec5d36587f3257810c3db0

See more details on using hashes here.

Provenance

The following attestation bundles were made for clinikit-0.1.0.tar.gz:

Publisher: publish.yml on clinikit/clinikit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file clinikit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: clinikit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 200.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for clinikit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a051149bad88d30538ebd2c28018cd27c0d75347fe3142402de89a24491b0ff6
MD5 52ee8c4d48a1cbaa3af688ba7708df94
BLAKE2b-256 beb73ec6abe9620c788a78575b6d74310d093d0b1ca03f1babdcd7785546b958

See more details on using hashes here.

Provenance

The following attestation bundles were made for clinikit-0.1.0-py3-none-any.whl:

Publisher: publish.yml on clinikit/clinikit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page