CardioMAS — Cardio Multi-Agent System for reproducible ECG dataset splits
Project description
CardioMAS — Cardio Multi-Agent System
A locally-runnable multi-agent system that analyzes ECG datasets and generates reproducible train/validation/test splits. Outputs are saved locally by default. Publishing to vlbthambawita/ECGBench on HuggingFace is an explicit opt-in step that requires write access.
Architecture
flowchart TD
CLI(["`**CLI / Python API**
cardiomas analyze`"])
CLI --> ORCH
subgraph PIPELINE ["LangGraph Pipeline — V2 Hub-and-Spoke"]
ORCH(["🎯 **Orchestrator**
Dynamic routing hub
sets next_agent each turn"])
%% Worker agents (spokes)
NL["📝 NL Requirement
Parse natural-language
user requirements"]
DISC["🔍 Discovery
Identify dataset type,
source & metadata"]
PAPER["📄 Paper
Find & parse paper
Extract split methodology"]
ANALYSIS["📊 Analysis
Scan files, parse CSV
Compute statistics"]
SPLIT["✂️ Splitter
SHA-256 seeded
deterministic splits"]
SEC["🔒 Security
PII scan · raw-data
Patient leakage check"]
CODER["💻 Coder
Generate custom
split code"]
PUB["☁️ Publisher
Push to HF
Update GitHub README"]
%% Orchestrator → workers (dispatch)
ORCH -->|dispatch| NL
ORCH -->|dispatch| DISC
ORCH -->|dispatch| PAPER
ORCH -->|dispatch| ANALYSIS
ORCH -->|dispatch| SPLIT
ORCH -->|dispatch| SEC
ORCH -->|dispatch| CODER
ORCH -->|dispatch| PUB
%% Workers → Orchestrator (return)
NL -->|done| ORCH
DISC -->|done| ORCH
PAPER -->|done| ORCH
ANALYSIS -->|done| ORCH
SPLIT -->|done| ORCH
SEC -->|done| ORCH
CODER -->|done| ORCH
PUB -->|done| ORCH
%% Terminal routes from orchestrator
ORCH -->|cache hit| EXISTING["↩ Return existing\nsplits from HF"]
ORCH -->|audit failed| ERR["❌ End with error\nBlocked — not saved"]
ORCH -->|pipeline done| SAVED["💾 End — saved locally"]
end
PUB -.->|if --push| HF[("HuggingFace
vlbthambawita/ECGBench")]
SAVED --> OUT
OUT["`**output/<dataset>/**
splits.json
split_metadata.json
analysis_report.md`"]
style PIPELINE fill:#1a1a2e,stroke:#4a9eff,color:#fff
style ORCH fill:#0f3460,stroke:#4a9eff,color:#fff,shape:stadium
style NL fill:#2d1b4e,stroke:#9b59b6,color:#fff
style DISC fill:#1b4332,stroke:#40916c,color:#fff
style PAPER fill:#3b1f5e,stroke:#9b59b6,color:#fff
style ANALYSIS fill:#7b4220,stroke:#e67e22,color:#fff
style SPLIT fill:#1a5276,stroke:#2e86c1,color:#fff
style SEC fill:#641e16,stroke:#e74c3c,color:#fff
style CODER fill:#1a3a4a,stroke:#00bcd4,color:#fff
style PUB fill:#145a32,stroke:#27ae60,color:#fff
style SAVED fill:#1a3a1a,stroke:#27ae60,color:#fff
style ERR fill:#3b0f0f,stroke:#e74c3c,color:#fff
style HF fill:#2d2d2d,stroke:#f5a623,color:#fff
style OUT fill:#2d2d2d,stroke:#4a9eff,color:#fff
style EXISTING fill:#2d2d2d,stroke:#4a9eff,color:#fff
style CLI fill:#0d0d0d,stroke:#4a9eff,color:#fff
Each node is a dedicated LLM-backed agent. The orchestrator is the central hub — it dynamically decides which agent to invoke next after each agent completes (hub-and-spoke pattern). Every worker agent returns to the orchestrator after finishing. Agents communicate only through the shared GraphState. The security agent is a hard gate — publishing is blocked if any check fails.
Requirements
- Python ≥ 3.10
- Ollama running locally with a model pulled (default:
gemma4:e2b)
ollama pull gemma4:e2b
ollama serve
pip install cardiomas
Quick Start
# Analyze a dataset — results saved to ./output/ptb-xl/
cardiomas analyze https://physionet.org/content/ptb-xl/1.0.3/
# Use a local directory
cardiomas analyze /data/ptb-xl/
# Stream all agent reasoning live
cardiomas analyze /data/ptb-xl/ --verbose
# Analyze and push to HuggingFace in one step (requires HF_TOKEN)
cardiomas analyze /data/ptb-xl/ --push
After analyze, the following files are written locally:
output/
└── ptb-xl/
├── splits.json # train/val/test record IDs + reproducibility config
├── split_metadata.json # seed, strategy, version, timestamp
└── analysis_report.md # LLM-generated dataset analysis
CLI Reference
cardiomas analyze
Analyze a dataset and save splits locally.
cardiomas analyze DATASET_SOURCE [OPTIONS]
| Option | Default | Description |
|---|---|---|
--local-path PATH |
Explicit local data path (skips download) | |
--output-dir PATH |
output |
Where to save results |
--seed INT |
42 |
Reproducibility seed |
--custom-split SPEC |
e.g. train:0.7,val:0.15,test:0.15 |
|
--stratify-by FIELD |
Metadata field to stratify splits by | |
--ignore-official |
Ignore official splits, generate fresh | |
--push |
Also push to HuggingFace (requires HF_TOKEN) |
|
--force-reanalysis |
Re-run even if already analyzed | |
--use-cloud-llm |
Use cloud LLM instead of local Ollama | |
--verbose / -v |
Stream agent reasoning and LLM calls live | |
--json |
Machine-readable JSON output |
Verbose output
--verbose (-v) prints every agent step and LLM prompt/response in real time instead of showing a spinner:
cardiomas analyze /data/ptb-xl/ -v
Verbose mode on — streaming agent output below.
[orchestrator] pipeline start — source: /data/ptb-xl/
[orchestrator] no cache hit — running full pipeline
[discovery] registry hit → ptb-xl
[paper] searching arXiv: 'ptb-xl ECG dataset electrocardiogram'
[paper] found 3 result(s)
[paper] calling LLM (ChatOllama)…
──────────────── paper — LLM call ────────────────
╭─ prompt ────────────────────────────────────────╮
│ Analyze this ECG dataset paper and extract: … │
╰─────────────────────────────────────────────────╯
╭─ response ──────────────────────────────────────╮
│ 1. Official splits: Yes (Section 2.3, page 4) │
╰─────────────────────────────────────────────────╯
[analysis] found 42 files
[splitter] saved splits → output/ptb-xl/splits.json
[security] audit PASSED — no PII, no raw data, no leakage
Each agent is color-coded. Without --verbose, only a spinner runs during the pipeline and a summary table is shown at the end.
cardiomas push
Push previously saved local splits to HuggingFace. Requires HF_TOKEN.
cardiomas push ptb-xl
cardiomas push ptb-xl --output-dir /my/results
Runs a security audit before uploading. Refuses to push if any check fails.
cardiomas status
Check if a dataset has published splits on HuggingFace.
cardiomas status ptb-xl
cardiomas list
cardiomas list # show known datasets (registry)
cardiomas list --remote # show datasets published on HuggingFace
cardiomas list --local # show locally cached datasets
cardiomas verify
Re-check reproducibility metadata of published splits.
cardiomas verify ptb-xl --seed 42
cardiomas contribute
Submit community splits to vlbthambawita/ECGBench.
cardiomas contribute ptb-xl --split-file my_splits.json
cardiomas config
cardiomas config --show
cardiomas config --set OLLAMA_MODEL=mistral
HuggingFace Publishing (opt-in)
Publishing requires write access to vlbthambawita/ECGBench. Set your token before pushing:
export HF_TOKEN=hf_...
cardiomas push ptb-xl
Only record identifiers are ever published — no raw ECG signals, no patient data.
Python API
from cardiomas import CardioMAS
mas = CardioMAS(ollama_model="gemma4:e2b", seed=42)
# Analyze and save locally
result = mas.analyze("/data/ptb-xl/")
print(result["local_output_dir"]) # output/ptb-xl
# Analyze and push to HuggingFace
mas.analyze("/data/ptb-xl/", push_to_hf=True)
# Read back published splits
splits = mas.get_splits("ptb-xl")
train_ids = splits["train"]
# Custom splits
mas.analyze(
"/data/ptb-xl/",
custom_split={"train": 0.7, "val": 0.15, "test": 0.15},
stratify_by="scp_codes",
seed=123,
)
Using Different Local Models
Any model available in Ollama works. Pull a model, then point CardioMAS at it:
# Default (recommended for full pipeline)
ollama pull gemma4:e2b
# Larger Gemma 4 variants for heavier reasoning tasks
ollama pull gemma4:e4b
ollama pull gemma4:27b
# DeepSeek Coder — best for the coding agent
ollama pull deepseek-coder:6.7b
# Use a specific model for the whole pipeline
OLLAMA_MODEL=gemma4:e2b cardiomas analyze /data/ptb-xl/
# Or set it permanently in .env
echo "OLLAMA_MODEL=gemma4:e2b" >> .env
Per-Agent LLM Configuration
Available in v0.2.0 (dev/v2-dynamic-orchestrator)
Each agent can use a different LLM. This is useful when you want a fast, lightweight model for simple tasks (discovery, security scan) and a more capable model for reasoning-heavy tasks (analysis, coding).
Via environment variables
# Fallback for all agents
OLLAMA_MODEL=gemma4:e2b
# Per-agent overrides (all optional)
AGENT_LLM_ORCHESTRATOR=gemma4:e2b
AGENT_LLM_NL_REQUIREMENT=gemma4:e2b
AGENT_LLM_DISCOVERY=gemma4:e2b
AGENT_LLM_PAPER=gemma4:e2b
AGENT_LLM_ANALYSIS=gemma4:e2b
AGENT_LLM_SPLITTER=gemma4:e2b
AGENT_LLM_SECURITY=gemma4:e2b
AGENT_LLM_CODER=deepseek-coder:6.7b
AGENT_LLM_PUBLISHER=gemma4:e2b
Set these in .env or export them before running cardiomas.
Via CLI flags
cardiomas analyze /data/ptb-xl/ \
--llm-coder deepseek-coder:6.7b \
--llm-analysis gemma4:e4b \
--llm-discovery gemma4:e2b
Via Python API
from cardiomas import CardioMAS
mas = CardioMAS(
agent_llms={
"coder": "deepseek-coder:6.7b",
"analysis": "gemma4:e4b",
"default": "gemma4:e2b", # fallback for all other agents
}
)
mas.analyze("/data/ptb-xl/")
Model recommendations
| Agent | Recommended model | Why |
|---|---|---|
orchestrator |
gemma4:e2b |
Default — dynamic routing decisions |
nl_requirement |
gemma4:e2b |
Simple parsing task |
discovery |
gemma4:e2b |
Lookup + classification |
paper |
gemma4:e2b or gemma4:e4b |
Needs to read and summarise papers |
analysis |
gemma4:e2b or gemma4:e4b |
Statistical reasoning |
splitter |
gemma4:e2b |
Deterministic — LLM role is minimal |
security |
gemma4:e2b |
Pattern matching |
coder |
deepseek-coder:6.7b |
Code generation |
publisher |
gemma4:e2b |
Structured output |
Verbose LLM name display
With --verbose, each LLM call shows the model name and backend:
──────────── paper — LLM call [gemma4:e2b @ ollama] ────────────
Environment Variables
Copy .env.example to .env and fill in as needed.
| Variable | Required for | Default |
|---|---|---|
OLLAMA_MODEL |
local LLM (default for all agents) | gemma4:e2b |
OLLAMA_BASE_URL |
local LLM | http://localhost:11434 |
AGENT_LLM_<AGENT> |
per-agent model override | (falls back to OLLAMA_MODEL) |
HF_TOKEN |
--push / cardiomas push |
— |
GITHUB_TOKEN |
GitHub README auto-update | — |
CARDIOMAS_SEED |
reproducibility | 42 |
CLOUD_LLM_PROVIDER |
--use-cloud-llm |
none |
Links
- HuggingFace Dataset: vlbthambawita/ECGBench
- PyPI: cardiomas
- GitHub: vlbthambawita/CardioMAS
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cardiomas-0.8.0.tar.gz.
File metadata
- Download URL: cardiomas-0.8.0.tar.gz
- Upload date:
- Size: 124.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18f4cb990b39d59e9f5077f7cc4713fccdd9d04a66142684c14330713a5c15a7
|
|
| MD5 |
ab39ae8be6c58f411ef41da4cb39a4bb
|
|
| BLAKE2b-256 |
0e40f2dcfe0b20768a30e06453e14d65a9287447b26461ca2ef9034c995f845d
|
Provenance
The following attestation bundles were made for cardiomas-0.8.0.tar.gz:
Publisher:
publish.yml on vlbthambawita/CardioMAS
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cardiomas-0.8.0.tar.gz -
Subject digest:
18f4cb990b39d59e9f5077f7cc4713fccdd9d04a66142684c14330713a5c15a7 - Sigstore transparency entry: 1308756242
- Sigstore integration time:
-
Permalink:
vlbthambawita/CardioMAS@c7b691e499b4815eb495f1b9c8bd43b867076032 -
Branch / Tag:
refs/tags/v0.8.0 - Owner: https://github.com/vlbthambawita
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c7b691e499b4815eb495f1b9c8bd43b867076032 -
Trigger Event:
push
-
Statement type:
File details
Details for the file cardiomas-0.8.0-py3-none-any.whl.
File metadata
- Download URL: cardiomas-0.8.0-py3-none-any.whl
- Upload date:
- Size: 103.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b615ae520a8f8b4f45cbe1eea35acfc912989a3c3294435c782b6bf425252c0e
|
|
| MD5 |
1bfb7a3f65cd5919c45bbe3cc54102ff
|
|
| BLAKE2b-256 |
a19dcfbe48c71828b38805f68daba5c9a41044337ebcd37968474a2a75c5d5f4
|
Provenance
The following attestation bundles were made for cardiomas-0.8.0-py3-none-any.whl:
Publisher:
publish.yml on vlbthambawita/CardioMAS
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cardiomas-0.8.0-py3-none-any.whl -
Subject digest:
b615ae520a8f8b4f45cbe1eea35acfc912989a3c3294435c782b6bf425252c0e - Sigstore transparency entry: 1308756611
- Sigstore integration time:
-
Permalink:
vlbthambawita/CardioMAS@c7b691e499b4815eb495f1b9c8bd43b867076032 -
Branch / Tag:
refs/tags/v0.8.0 - Owner: https://github.com/vlbthambawita
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c7b691e499b4815eb495f1b9c8bd43b867076032 -
Trigger Event:
push
-
Statement type: