Skip to main content

CardioMAS — Cardio Multi-Agent System for reproducible ECG dataset splits

Project description

CardioMAS — Cardio Multi-Agent System

CI PyPI License HuggingFace

A locally-runnable multi-agent system that analyzes ECG datasets and generates reproducible train/validation/test splits. Outputs are saved locally by default. Publishing to vlbthambawita/ECGBench on HuggingFace is an explicit opt-in step that requires write access.

Architecture

flowchart TD
    CLI(["`**CLI / Python API**
    cardiomas analyze`"])
    CLI --> ORCH

    subgraph PIPELINE ["LangGraph Pipeline"]
        ORCH["🎯 Orchestrator
        Check HF cache"]

        ORCH -->|cache hit| EXISTING["Return existing\nsplits from HF"]
        ORCH -->|no cache / --force| DISC

        DISC["🔍 Discovery Agent
        Identify dataset type,\nsource, metadata"]

        PAPER["📄 Paper Agent
        Find & parse paper\nExtract split methodology"]

        ANALYSIS["📊 Analysis Agent
        Scan files, parse CSV metadata\nCompute statistics"]

        SPLIT["✂️ Splitter Agent
        SHA-256 seeded deterministic splits\nPatient-level / stratified"]

        SEC["🔒 Security Agent
        PII scan · raw-data check\nPatient leakage detection"]

        PUB["☁️ Publisher Agent
        Push to HF · Update GitHub README"]

        DISC --> PAPER --> ANALYSIS --> SPLIT --> SEC

        SEC -->|audit failed| ERR["❌ End with error\nBlocked — not saved"]
        SEC -->|passed, no --push| SAVED["💾 Saved locally"]
        SEC -->|passed + --push| PUB
        PUB --> HF[("HuggingFace\nvlbthambawita/ECGBench")]
    end

    SAVED --> OUT
    PUB --> OUT

    OUT["`**output/<dataset>/**
    splits.json
    split_metadata.json
    analysis_report.md`"]

    style PIPELINE fill:#1a1a2e,stroke:#4a9eff,color:#fff
    style ORCH fill:#0f4c75,stroke:#4a9eff,color:#fff
    style DISC fill:#1b4332,stroke:#40916c,color:#fff
    style PAPER fill:#3b1f5e,stroke:#9b59b6,color:#fff
    style ANALYSIS fill:#7b4220,stroke:#e67e22,color:#fff
    style SPLIT fill:#1a5276,stroke:#2e86c1,color:#fff
    style SEC fill:#641e16,stroke:#e74c3c,color:#fff
    style PUB fill:#145a32,stroke:#27ae60,color:#fff
    style SAVED fill:#1a3a1a,stroke:#27ae60,color:#fff
    style ERR fill:#3b0f0f,stroke:#e74c3c,color:#fff
    style HF fill:#2d2d2d,stroke:#f5a623,color:#fff
    style OUT fill:#2d2d2d,stroke:#4a9eff,color:#fff
    style EXISTING fill:#2d2d2d,stroke:#4a9eff,color:#fff
    style CLI fill:#0d0d0d,stroke:#4a9eff,color:#fff

Each node is a dedicated LLM-backed agent. Agents communicate only through the shared GraphState. The security agent is a hard gate — publishing is blocked if any check fails.

Requirements

  • Python ≥ 3.10
  • Ollama running locally with a model pulled (default: llama3.1:8b)
ollama pull llama3.1:8b
ollama serve
pip install cardiomas

Quick Start

# Analyze a dataset — results saved to ./output/ptb-xl/
cardiomas analyze https://physionet.org/content/ptb-xl/1.0.3/

# Use a local directory
cardiomas analyze /data/ptb-xl/

# Stream all agent reasoning live
cardiomas analyze /data/ptb-xl/ --verbose

# Analyze and push to HuggingFace in one step (requires HF_TOKEN)
cardiomas analyze /data/ptb-xl/ --push

After analyze, the following files are written locally:

output/
└── ptb-xl/
    ├── splits.json           # train/val/test record IDs + reproducibility config
    ├── split_metadata.json   # seed, strategy, version, timestamp
    └── analysis_report.md    # LLM-generated dataset analysis

CLI Reference

cardiomas analyze

Analyze a dataset and save splits locally.

cardiomas analyze DATASET_SOURCE [OPTIONS]
Option Default Description
--local-path PATH Explicit local data path (skips download)
--output-dir PATH output Where to save results
--seed INT 42 Reproducibility seed
--custom-split SPEC e.g. train:0.7,val:0.15,test:0.15
--stratify-by FIELD Metadata field to stratify splits by
--ignore-official Ignore official splits, generate fresh
--push Also push to HuggingFace (requires HF_TOKEN)
--force-reanalysis Re-run even if already analyzed
--use-cloud-llm Use cloud LLM instead of local Ollama
--verbose / -v Stream agent reasoning and LLM calls live
--json Machine-readable JSON output

Verbose output

--verbose (-v) prints every agent step and LLM prompt/response in real time instead of showing a spinner:

cardiomas analyze /data/ptb-xl/ -v
Verbose mode on — streaming agent output below.

  [orchestrator] pipeline start — source: /data/ptb-xl/
  [orchestrator] no cache hit — running full pipeline
  [discovery]    registry hit → ptb-xl
  [paper]        searching arXiv: 'ptb-xl ECG dataset electrocardiogram'
  [paper]        found 3 result(s)
  [paper]        calling LLM (ChatOllama)…
──────────────── paper — LLM call ────────────────
╭─ prompt ────────────────────────────────────────╮
│ Analyze this ECG dataset paper and extract: …  │
╰─────────────────────────────────────────────────╯
╭─ response ──────────────────────────────────────╮
│ 1. Official splits: Yes (Section 2.3, page 4)  │
╰─────────────────────────────────────────────────╯
  [analysis]     found 42 files
  [splitter]     saved splits → output/ptb-xl/splits.json
  [security]     audit PASSED — no PII, no raw data, no leakage

Each agent is color-coded. Without --verbose, only a spinner runs during the pipeline and a summary table is shown at the end.

cardiomas push

Push previously saved local splits to HuggingFace. Requires HF_TOKEN.

cardiomas push ptb-xl
cardiomas push ptb-xl --output-dir /my/results

Runs a security audit before uploading. Refuses to push if any check fails.

cardiomas status

Check if a dataset has published splits on HuggingFace.

cardiomas status ptb-xl

cardiomas list

cardiomas list              # show known datasets (registry)
cardiomas list --remote     # show datasets published on HuggingFace
cardiomas list --local      # show locally cached datasets

cardiomas verify

Re-check reproducibility metadata of published splits.

cardiomas verify ptb-xl --seed 42

cardiomas contribute

Submit community splits to vlbthambawita/ECGBench.

cardiomas contribute ptb-xl --split-file my_splits.json

cardiomas config

cardiomas config --show
cardiomas config --set OLLAMA_MODEL=mistral

HuggingFace Publishing (opt-in)

Publishing requires write access to vlbthambawita/ECGBench. Set your token before pushing:

export HF_TOKEN=hf_...
cardiomas push ptb-xl

Only record identifiers are ever published — no raw ECG signals, no patient data.

Python API

from cardiomas import CardioMAS

mas = CardioMAS(ollama_model="llama3.1:8b", seed=42)

# Analyze and save locally
result = mas.analyze("/data/ptb-xl/")
print(result["local_output_dir"])   # output/ptb-xl

# Analyze and push to HuggingFace
mas.analyze("/data/ptb-xl/", push_to_hf=True)

# Read back published splits
splits = mas.get_splits("ptb-xl")
train_ids = splits["train"]

# Custom splits
mas.analyze(
    "/data/ptb-xl/",
    custom_split={"train": 0.7, "val": 0.15, "test": 0.15},
    stratify_by="scp_codes",
    seed=123,
)

Using Different Local Models

Any model available in Ollama works. Pull a model, then point CardioMAS at it:

# Default (recommended for full pipeline)
ollama pull llama3.1:8b

# Gemma models (Google) — lighter, fast on CPU
ollama pull gemma3:4b
ollama pull gemma3:12b
ollama pull gemma3:27b

# DeepSeek Coder — best for the coding agent
ollama pull deepseek-coder:6.7b

# Use a specific model for the whole pipeline
OLLAMA_MODEL=gemma3:4b cardiomas analyze /data/ptb-xl/

# Or set it permanently in .env
echo "OLLAMA_MODEL=gemma3:4b" >> .env

Per-Agent LLM Configuration

Available in v0.2.0 (dev/v2-dynamic-orchestrator)

Each agent can use a different LLM. This is useful when you want a fast, lightweight model for simple tasks (discovery, security scan) and a more capable model for reasoning-heavy tasks (analysis, coding).

Via environment variables

# Fallback for all agents
OLLAMA_MODEL=llama3.1:8b

# Per-agent overrides (all optional)
AGENT_LLM_ORCHESTRATOR=llama3.1:8b
AGENT_LLM_NL_REQUIREMENT=gemma3:4b
AGENT_LLM_DISCOVERY=gemma3:4b
AGENT_LLM_PAPER=llama3.1:8b
AGENT_LLM_ANALYSIS=llama3.1:8b
AGENT_LLM_SPLITTER=gemma3:4b
AGENT_LLM_SECURITY=gemma3:4b
AGENT_LLM_CODER=deepseek-coder:6.7b
AGENT_LLM_PUBLISHER=gemma3:4b

Set these in .env or export them before running cardiomas.

Via CLI flags

cardiomas analyze /data/ptb-xl/ \
  --llm-coder deepseek-coder:6.7b \
  --llm-analysis llama3.1:8b \
  --llm-discovery gemma3:4b

Via Python API

from cardiomas import CardioMAS

mas = CardioMAS(
    agent_llms={
        "coder":    "deepseek-coder:6.7b",
        "analysis": "llama3.1:8b",
        "default":  "gemma3:4b",   # fallback for all other agents
    }
)
mas.analyze("/data/ptb-xl/")

Model recommendations

Agent Recommended model Why
orchestrator llama3.1:8b Reasoning-heavy routing decisions
nl_requirement gemma3:4b Simple parsing task
discovery gemma3:4b Lookup + classification
paper llama3.1:8b Needs to read and summarise papers
analysis llama3.1:8b or llama3.1:70b Statistical reasoning
splitter gemma3:4b Deterministic — LLM role is minimal
security gemma3:4b Pattern matching
coder deepseek-coder:6.7b Code generation
publisher gemma3:4b Structured output

Verbose LLM name display

With --verbose, each LLM call shows the model name and backend:

──────────── paper — LLM call [llama3.1:8b @ ollama] ────────────

Environment Variables

Copy .env.example to .env and fill in as needed.

Variable Required for Default
OLLAMA_MODEL local LLM (default for all agents) llama3.1:8b
OLLAMA_BASE_URL local LLM http://localhost:11434
AGENT_LLM_<AGENT> per-agent model override (falls back to OLLAMA_MODEL)
HF_TOKEN --push / cardiomas push
GITHUB_TOKEN GitHub README auto-update
CARDIOMAS_SEED reproducibility 42
CLOUD_LLM_PROVIDER --use-cloud-llm none

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cardiomas-0.5.0.tar.gz (97.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cardiomas-0.5.0-py3-none-any.whl (78.1 kB view details)

Uploaded Python 3

File details

Details for the file cardiomas-0.5.0.tar.gz.

File metadata

  • Download URL: cardiomas-0.5.0.tar.gz
  • Upload date:
  • Size: 97.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cardiomas-0.5.0.tar.gz
Algorithm Hash digest
SHA256 9c614209c1c02a806375312587c939c5966be454bbe112674844cc6674d9094b
MD5 cbe33a71f877ced423e9098c141976c7
BLAKE2b-256 e24e7e2699c25dc46c326f4b17b60a19ff1f85278c61c9eb6ae070efc3e2c844

See more details on using hashes here.

Provenance

The following attestation bundles were made for cardiomas-0.5.0.tar.gz:

Publisher: publish.yml on vlbthambawita/CardioMAS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cardiomas-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: cardiomas-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 78.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cardiomas-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a16f40d7e9968ddb78644fab798797256dedcc051d99a4925d5a901f6b62a6a9
MD5 4aeee05d36db9fd09d218f92f5ba3d1e
BLAKE2b-256 1856cce11954f532a5198ba57bde30ba57e9a94dd383a4a475939033aa2c75b5

See more details on using hashes here.

Provenance

The following attestation bundles were made for cardiomas-0.5.0-py3-none-any.whl:

Publisher: publish.yml on vlbthambawita/CardioMAS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page