Automated Data Intelligence System (ADIS) - Explainable ML Pipeline and AI Critic
Project description
ADIS — Automated Data Intelligence System
An explainability-first AutoML library with built-in AI vulnerability detection.
ADIS runs a complete data science pipeline — ingestion, cleaning, EDA, feature engineering, model benchmarking — and produces a human-readable explanation at every step. Its AI Critic then audits the entire pipeline for data leakage, metric illusions, overfitting risks, and production readiness.
Quick Start
Install
pip install adis-autoresearch
Basic Usage (3 lines)
from adis import ADISPipeline
pipeline = ADISPipeline(target_column="target")
results = pipeline.run("data.csv")
pipeline.save_report() # Saves report.json + report.md + cleaned_data.csv
Use Individual Modules
from adis import run_ingestion, run_cleaning, run_eda, run_critic
# Just ingest and inspect
result = run_ingestion("data.csv")
print(result["column_info"]) # Per-column type detection
print(result["validation"]) # Schema issues & warnings
# Clean a DataFrame
from adis import run_cleaning
cleaned = run_cleaning(df, column_info, strategy="knn")
print(cleaned["log"]) # Every cleaning action logged
# Run the AI Critic on any pipeline results
critic = run_critic(pipeline_results)
for vuln in critic["vulnerabilities"]:
print(f"[{vuln['severity']}] {vuln['issue']}")
Use the Autonomous Agent (Experimental)
from adis.agent import AutoResearchAgent
agent = AutoResearchAgent(
filepath="data.csv",
target_column="price",
max_iterations=10,
)
# Requires: LLM_API_KEY env var + ADIS_ALLOW_EXEC=1
results = agent.optimize()
What Makes ADIS Different
| Feature | Typical AutoML | ADIS |
|---|---|---|
| Explainability | Post-hoc (SHAP/LIME) | Built into every step — what_happened, why, impact |
| Safety Audit | None | AI Critic detects leakage, metric illusions, overfitting |
| Pipeline Report | Metrics table | Full Markdown/JSON narrative with rationale |
| Leakage Prevention | Manual | Automatic — train/test split before feature engineering |
| Target | Best score | Best score that's safe for production |
Pipeline Stages
CSV File
│
▼
┌─────────────────┐
│ Ingestion │ → Type detection, schema validation, warnings
├─────────────────┤
│ Cleaning │ → Imputation, dedup, outlier detection, type coercion
├─────────────────┤
│ EDA │ → Distributions, correlations, class imbalance, flags
├─────────────────┤
│ Feature Eng. │ → Log/sqrt transforms, binning, OHE, datetime decomposition
├─────────────────┤
│ Feature Sel. │ → Variance filter, correlation filter, mutual information
├─────────────────┤
│ Benchmarking │ → 3-4 models + dummy baseline, full metric suite
├─────────────────┤
│ AI Critic │ → Cross-signal vulnerability detection
└─────────────────┘
│
▼
JSON/Markdown Report + Cleaned CSV
Each stage returns a structured result dict with:
df— The transformed DataFrameexplanation— Human-readable{title, what_happened, why, impact}step— Stage identifier
AI Critic — Vulnerability Detection
The Critic cross-references signals from across the pipeline to flag issues that single-stage analysis would miss:
| Vulnerability | What It Catches |
|---|---|
| Metric Illusion | High accuracy + low AUC on imbalanced data = model is lazy |
| Target Leakage | Near-perfect score driven by one dominant feature |
| Overfitting Risk | Complex model on tiny dataset |
| Temporal Leakage | Random split on time-series data |
| Production Blockers | Composite check — is this model safe to deploy? |
critic = results["critic"]
print(critic["is_structurally_safe"]) # True/False
for v in critic["vulnerabilities"]:
print(f" [{v['severity']}] {v['issue']} (confidence: {v['confidence']})")
Configuration
Environment Variables
| Variable | Required | Description |
|---|---|---|
LLM_API_KEY |
For agent only | API key for LLM-powered research agent |
ADIS_ALLOW_EXEC |
For agent only | Set to 1 to enable code execution sandbox |
Optional Dependencies
pip install -e ".[ui]" # Streamlit dashboard
pip install -e ".[agent]" # Autonomous research agent
pip install -e ".[imbalanced]" # SMOTE oversampling
pip install -e ".[all]" # Everything
pip install -e ".[dev]" # pytest + ruff
Streamlit Dashboard
A visual frontend is included for interactive exploration:
pip install -e ".[ui]"
streamlit run app.py
Development
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
# Lint
ruff check adis/ tests/
Project Structure
adis/
├── __init__.py # Public API: ADISPipeline + all run_* functions
├── schemas.py # Pydantic data contracts
├── pipeline.py # Pipeline orchestrator
├── agent.py # Autonomous research agent (experimental)
├── ingestion.py # CSV loading, type detection, validation
├── cleaning.py # Imputation, dedup, outliers, coercion
├── eda.py # Distributions, correlations, imbalance
├── feature_engineering.py # Transforms, binning, encoding, datetime
├── feature_selection.py # Variance, correlation, mutual information
├── model_recommendation.py # Problem type detection, model ranking
├── benchmarking.py # Multi-model training + evaluation
└── critic.py # AI vulnerability detection
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file adis_autoresearch-0.1.4.tar.gz.
File metadata
- Download URL: adis_autoresearch-0.1.4.tar.gz
- Upload date:
- Size: 43.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a428c78092ef5d0817eb1873267ec1423faa36775a2fb6130eb8153083045112
|
|
| MD5 |
ddd5c80bd48e8514cbbd3e158ff4d35b
|
|
| BLAKE2b-256 |
e7129be92ddba34a38b3b1aafe0eceab03de899e5322af1daca6d9e53eafdde3
|
File details
Details for the file adis_autoresearch-0.1.4-py3-none-any.whl.
File metadata
- Download URL: adis_autoresearch-0.1.4-py3-none-any.whl
- Upload date:
- Size: 43.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67591fdbef05d95bb67aedddabc332a5a7a787a0defe5c525dfc64818913c98c
|
|
| MD5 |
5356ab7db31c59a3fbf92bf232050dac
|
|
| BLAKE2b-256 |
0ff52079e2a8e9eb0f297dd39a3888bb96d73fcc64dacd9cde858d5283b97ba2
|