CardioMAS — Cardio Multi-Agent System for reproducible ECG dataset splits
Project description
CardioMAS — Cardio Multi-Agent System
A locally-runnable multi-agent system that analyzes ECG datasets and generates reproducible train/validation/test splits. Outputs are saved locally by default. Publishing to vlbthambawita/ECGBench on HuggingFace is an explicit opt-in step that requires write access.
Architecture
flowchart TD
CLI(["`**CLI / Python API**
cardiomas analyze`"])
CLI --> ORCH
subgraph PIPELINE ["LangGraph Pipeline"]
ORCH["🎯 Orchestrator
Check HF cache"]
ORCH -->|cache hit| EXISTING["Return existing\nsplits from HF"]
ORCH -->|no cache / --force| DISC
DISC["🔍 Discovery Agent
Identify dataset type,\nsource, metadata"]
PAPER["📄 Paper Agent
Find & parse paper\nExtract split methodology"]
ANALYSIS["📊 Analysis Agent
Scan files, parse CSV metadata\nCompute statistics"]
SPLIT["✂️ Splitter Agent
SHA-256 seeded deterministic splits\nPatient-level / stratified"]
SEC["🔒 Security Agent
PII scan · raw-data check\nPatient leakage detection"]
PUB["☁️ Publisher Agent
Push to HF · Update GitHub README"]
DISC --> PAPER --> ANALYSIS --> SPLIT --> SEC
SEC -->|audit failed| ERR["❌ End with error\nBlocked — not saved"]
SEC -->|passed, no --push| SAVED["💾 Saved locally"]
SEC -->|passed + --push| PUB
PUB --> HF[("HuggingFace\nvlbthambawita/ECGBench")]
end
SAVED --> OUT
PUB --> OUT
OUT["`**output/<dataset>/**
splits.json
split_metadata.json
analysis_report.md`"]
style PIPELINE fill:#1a1a2e,stroke:#4a9eff,color:#fff
style ORCH fill:#0f4c75,stroke:#4a9eff,color:#fff
style DISC fill:#1b4332,stroke:#40916c,color:#fff
style PAPER fill:#3b1f5e,stroke:#9b59b6,color:#fff
style ANALYSIS fill:#7b4220,stroke:#e67e22,color:#fff
style SPLIT fill:#1a5276,stroke:#2e86c1,color:#fff
style SEC fill:#641e16,stroke:#e74c3c,color:#fff
style PUB fill:#145a32,stroke:#27ae60,color:#fff
style SAVED fill:#1a3a1a,stroke:#27ae60,color:#fff
style ERR fill:#3b0f0f,stroke:#e74c3c,color:#fff
style HF fill:#2d2d2d,stroke:#f5a623,color:#fff
style OUT fill:#2d2d2d,stroke:#4a9eff,color:#fff
style EXISTING fill:#2d2d2d,stroke:#4a9eff,color:#fff
style CLI fill:#0d0d0d,stroke:#4a9eff,color:#fff
Each node is a dedicated LLM-backed agent. Agents communicate only through the shared GraphState. The security agent is a hard gate — publishing is blocked if any check fails.
Requirements
- Python ≥ 3.10
- Ollama running locally with a model pulled (default:
llama3.1:8b)
ollama pull llama3.1:8b
ollama serve
pip install cardiomas
Quick Start
# Analyze a dataset — results saved to ./output/ptb-xl/
cardiomas analyze https://physionet.org/content/ptb-xl/1.0.3/
# Use a local directory
cardiomas analyze /data/ptb-xl/
# Stream all agent reasoning live
cardiomas analyze /data/ptb-xl/ --verbose
# Analyze and push to HuggingFace in one step (requires HF_TOKEN)
cardiomas analyze /data/ptb-xl/ --push
After analyze, the following files are written locally:
output/
└── ptb-xl/
├── splits.json # train/val/test record IDs + reproducibility config
├── split_metadata.json # seed, strategy, version, timestamp
└── analysis_report.md # LLM-generated dataset analysis
CLI Reference
cardiomas analyze
Analyze a dataset and save splits locally.
cardiomas analyze DATASET_SOURCE [OPTIONS]
| Option | Default | Description |
|---|---|---|
--local-path PATH |
Explicit local data path (skips download) | |
--output-dir PATH |
output |
Where to save results |
--seed INT |
42 |
Reproducibility seed |
--custom-split SPEC |
e.g. train:0.7,val:0.15,test:0.15 |
|
--stratify-by FIELD |
Metadata field to stratify splits by | |
--ignore-official |
Ignore official splits, generate fresh | |
--push |
Also push to HuggingFace (requires HF_TOKEN) |
|
--force-reanalysis |
Re-run even if already analyzed | |
--use-cloud-llm |
Use cloud LLM instead of local Ollama | |
--verbose / -v |
Stream agent reasoning and LLM calls live | |
--json |
Machine-readable JSON output |
Verbose output
--verbose (-v) prints every agent step and LLM prompt/response in real time instead of showing a spinner:
cardiomas analyze /data/ptb-xl/ -v
Verbose mode on — streaming agent output below.
[orchestrator] pipeline start — source: /data/ptb-xl/
[orchestrator] no cache hit — running full pipeline
[discovery] registry hit → ptb-xl
[paper] searching arXiv: 'ptb-xl ECG dataset electrocardiogram'
[paper] found 3 result(s)
[paper] calling LLM (ChatOllama)…
──────────────── paper — LLM call ────────────────
╭─ prompt ────────────────────────────────────────╮
│ Analyze this ECG dataset paper and extract: … │
╰─────────────────────────────────────────────────╯
╭─ response ──────────────────────────────────────╮
│ 1. Official splits: Yes (Section 2.3, page 4) │
╰─────────────────────────────────────────────────╯
[analysis] found 42 files
[splitter] saved splits → output/ptb-xl/splits.json
[security] audit PASSED — no PII, no raw data, no leakage
Each agent is color-coded. Without --verbose, only a spinner runs during the pipeline and a summary table is shown at the end.
cardiomas push
Push previously saved local splits to HuggingFace. Requires HF_TOKEN.
cardiomas push ptb-xl
cardiomas push ptb-xl --output-dir /my/results
Runs a security audit before uploading. Refuses to push if any check fails.
cardiomas status
Check if a dataset has published splits on HuggingFace.
cardiomas status ptb-xl
cardiomas list
cardiomas list # show known datasets (registry)
cardiomas list --remote # show datasets published on HuggingFace
cardiomas list --local # show locally cached datasets
cardiomas verify
Re-check reproducibility metadata of published splits.
cardiomas verify ptb-xl --seed 42
cardiomas contribute
Submit community splits to vlbthambawita/ECGBench.
cardiomas contribute ptb-xl --split-file my_splits.json
cardiomas config
cardiomas config --show
cardiomas config --set OLLAMA_MODEL=mistral
HuggingFace Publishing (opt-in)
Publishing requires write access to vlbthambawita/ECGBench. Set your token before pushing:
export HF_TOKEN=hf_...
cardiomas push ptb-xl
Only record identifiers are ever published — no raw ECG signals, no patient data.
Python API
from cardiomas import CardioMAS
mas = CardioMAS(ollama_model="llama3.1:8b", seed=42)
# Analyze and save locally
result = mas.analyze("/data/ptb-xl/")
print(result["local_output_dir"]) # output/ptb-xl
# Analyze and push to HuggingFace
mas.analyze("/data/ptb-xl/", push_to_hf=True)
# Read back published splits
splits = mas.get_splits("ptb-xl")
train_ids = splits["train"]
# Custom splits
mas.analyze(
"/data/ptb-xl/",
custom_split={"train": 0.7, "val": 0.15, "test": 0.15},
stratify_by="scp_codes",
seed=123,
)
Using Different Local Models
Any model available in Ollama works. Pull a model, then point CardioMAS at it:
# Default (recommended for full pipeline)
ollama pull llama3.1:8b
# Gemma models (Google) — lighter, fast on CPU
ollama pull gemma3:4b
ollama pull gemma3:12b
ollama pull gemma3:27b
# DeepSeek Coder — best for the coding agent
ollama pull deepseek-coder:6.7b
# Use a specific model for the whole pipeline
OLLAMA_MODEL=gemma3:4b cardiomas analyze /data/ptb-xl/
# Or set it permanently in .env
echo "OLLAMA_MODEL=gemma3:4b" >> .env
Per-Agent LLM Configuration
Available in v0.2.0 (dev/v2-dynamic-orchestrator)
Each agent can use a different LLM. This is useful when you want a fast, lightweight model for simple tasks (discovery, security scan) and a more capable model for reasoning-heavy tasks (analysis, coding).
Via environment variables
# Fallback for all agents
OLLAMA_MODEL=llama3.1:8b
# Per-agent overrides (all optional)
AGENT_LLM_ORCHESTRATOR=llama3.1:8b
AGENT_LLM_NL_REQUIREMENT=gemma3:4b
AGENT_LLM_DISCOVERY=gemma3:4b
AGENT_LLM_PAPER=llama3.1:8b
AGENT_LLM_ANALYSIS=llama3.1:8b
AGENT_LLM_SPLITTER=gemma3:4b
AGENT_LLM_SECURITY=gemma3:4b
AGENT_LLM_CODER=deepseek-coder:6.7b
AGENT_LLM_PUBLISHER=gemma3:4b
Set these in .env or export them before running cardiomas.
Via CLI flags
cardiomas analyze /data/ptb-xl/ \
--llm-coder deepseek-coder:6.7b \
--llm-analysis llama3.1:8b \
--llm-discovery gemma3:4b
Via Python API
from cardiomas import CardioMAS
mas = CardioMAS(
agent_llms={
"coder": "deepseek-coder:6.7b",
"analysis": "llama3.1:8b",
"default": "gemma3:4b", # fallback for all other agents
}
)
mas.analyze("/data/ptb-xl/")
Model recommendations
| Agent | Recommended model | Why |
|---|---|---|
orchestrator |
llama3.1:8b |
Reasoning-heavy routing decisions |
nl_requirement |
gemma3:4b |
Simple parsing task |
discovery |
gemma3:4b |
Lookup + classification |
paper |
llama3.1:8b |
Needs to read and summarise papers |
analysis |
llama3.1:8b or llama3.1:70b |
Statistical reasoning |
splitter |
gemma3:4b |
Deterministic — LLM role is minimal |
security |
gemma3:4b |
Pattern matching |
coder |
deepseek-coder:6.7b |
Code generation |
publisher |
gemma3:4b |
Structured output |
Verbose LLM name display
With --verbose, each LLM call shows the model name and backend:
──────────── paper — LLM call [llama3.1:8b @ ollama] ────────────
Environment Variables
Copy .env.example to .env and fill in as needed.
| Variable | Required for | Default |
|---|---|---|
OLLAMA_MODEL |
local LLM (default for all agents) | llama3.1:8b |
OLLAMA_BASE_URL |
local LLM | http://localhost:11434 |
AGENT_LLM_<AGENT> |
per-agent model override | (falls back to OLLAMA_MODEL) |
HF_TOKEN |
--push / cardiomas push |
— |
GITHUB_TOKEN |
GitHub README auto-update | — |
CARDIOMAS_SEED |
reproducibility | 42 |
CLOUD_LLM_PROVIDER |
--use-cloud-llm |
none |
Links
- HuggingFace Dataset: vlbthambawita/ECGBench
- PyPI: cardiomas
- GitHub: vlbthambawita/CardioMAS
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cardiomas-0.5.0.tar.gz.
File metadata
- Download URL: cardiomas-0.5.0.tar.gz
- Upload date:
- Size: 97.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c614209c1c02a806375312587c939c5966be454bbe112674844cc6674d9094b
|
|
| MD5 |
cbe33a71f877ced423e9098c141976c7
|
|
| BLAKE2b-256 |
e24e7e2699c25dc46c326f4b17b60a19ff1f85278c61c9eb6ae070efc3e2c844
|
Provenance
The following attestation bundles were made for cardiomas-0.5.0.tar.gz:
Publisher:
publish.yml on vlbthambawita/CardioMAS
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cardiomas-0.5.0.tar.gz -
Subject digest:
9c614209c1c02a806375312587c939c5966be454bbe112674844cc6674d9094b - Sigstore transparency entry: 1305483919
- Sigstore integration time:
-
Permalink:
vlbthambawita/CardioMAS@ca03ede11dca83db285995f05d598484aad66948 -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/vlbthambawita
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ca03ede11dca83db285995f05d598484aad66948 -
Trigger Event:
push
-
Statement type:
File details
Details for the file cardiomas-0.5.0-py3-none-any.whl.
File metadata
- Download URL: cardiomas-0.5.0-py3-none-any.whl
- Upload date:
- Size: 78.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a16f40d7e9968ddb78644fab798797256dedcc051d99a4925d5a901f6b62a6a9
|
|
| MD5 |
4aeee05d36db9fd09d218f92f5ba3d1e
|
|
| BLAKE2b-256 |
1856cce11954f532a5198ba57bde30ba57e9a94dd383a4a475939033aa2c75b5
|
Provenance
The following attestation bundles were made for cardiomas-0.5.0-py3-none-any.whl:
Publisher:
publish.yml on vlbthambawita/CardioMAS
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cardiomas-0.5.0-py3-none-any.whl -
Subject digest:
a16f40d7e9968ddb78644fab798797256dedcc051d99a4925d5a901f6b62a6a9 - Sigstore transparency entry: 1305484007
- Sigstore integration time:
-
Permalink:
vlbthambawita/CardioMAS@ca03ede11dca83db285995f05d598484aad66948 -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/vlbthambawita
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ca03ede11dca83db285995f05d598484aad66948 -
Trigger Event:
push
-
Statement type: