CardioMAS — Cardio Multi-Agent System for reproducible ECG dataset splits

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

vlbthambawita

These details have not been verified by PyPI

Project links

HuggingFace Dataset

Project description

CardioMAS — Cardio Multi-Agent System

A locally-runnable multi-agent system that analyzes ECG datasets and generates reproducible train/validation/test splits. Outputs are saved locally by default. Publishing to vlbthambawita/ECGBench on HuggingFace is an explicit opt-in step that requires write access.

Architecture

flowchart TD
    CLI(["`**CLI / Python API**
    cardiomas analyze`"])
    CLI --> ORCH

    subgraph PIPELINE ["LangGraph Pipeline"]
        ORCH["🎯 Orchestrator
        Check HF cache"]

        ORCH -->|cache hit| EXISTING["Return existing\nsplits from HF"]
        ORCH -->|no cache / --force| DISC

        DISC["🔍 Discovery Agent
        Identify dataset type,\nsource, metadata"]

        PAPER["📄 Paper Agent
        Find & parse paper\nExtract split methodology"]

        ANALYSIS["📊 Analysis Agent
        Scan files, parse CSV metadata\nCompute statistics"]

        SPLIT["✂️ Splitter Agent
        SHA-256 seeded deterministic splits\nPatient-level / stratified"]

        SEC["🔒 Security Agent
        PII scan · raw-data check\nPatient leakage detection"]

        PUB["☁️ Publisher Agent
        Push to HF · Update GitHub README"]

        DISC --> PAPER --> ANALYSIS --> SPLIT --> SEC

        SEC -->|audit failed| ERR["❌ End with error\nBlocked — not saved"]
        SEC -->|passed, no --push| SAVED["💾 Saved locally"]
        SEC -->|passed + --push| PUB
        PUB --> HF[("HuggingFace\nvlbthambawita/ECGBench")]
    end

    SAVED --> OUT
    PUB --> OUT

    OUT["`**output/&lt;dataset&gt;/**
    splits.json
    split_metadata.json
    analysis_report.md`"]

    style PIPELINE fill:#1a1a2e,stroke:#4a9eff,color:#fff
    style ORCH fill:#0f4c75,stroke:#4a9eff,color:#fff
    style DISC fill:#1b4332,stroke:#40916c,color:#fff
    style PAPER fill:#3b1f5e,stroke:#9b59b6,color:#fff
    style ANALYSIS fill:#7b4220,stroke:#e67e22,color:#fff
    style SPLIT fill:#1a5276,stroke:#2e86c1,color:#fff
    style SEC fill:#641e16,stroke:#e74c3c,color:#fff
    style PUB fill:#145a32,stroke:#27ae60,color:#fff
    style SAVED fill:#1a3a1a,stroke:#27ae60,color:#fff
    style ERR fill:#3b0f0f,stroke:#e74c3c,color:#fff
    style HF fill:#2d2d2d,stroke:#f5a623,color:#fff
    style OUT fill:#2d2d2d,stroke:#4a9eff,color:#fff
    style EXISTING fill:#2d2d2d,stroke:#4a9eff,color:#fff
    style CLI fill:#0d0d0d,stroke:#4a9eff,color:#fff

Each node is a dedicated LLM-backed agent. Agents communicate only through the shared GraphState. The security agent is a hard gate — publishing is blocked if any check fails.

Requirements

Python ≥ 3.10
Ollama running locally with a model pulled (default: llama3.1:8b)

ollama pull llama3.1:8b
ollama serve
pip install cardiomas

Quick Start

# Analyze a dataset — results saved to ./output/ptb-xl/
cardiomas analyze https://physionet.org/content/ptb-xl/1.0.3/

# Use a local directory
cardiomas analyze /data/ptb-xl/

# Stream all agent reasoning live
cardiomas analyze /data/ptb-xl/ --verbose

# Analyze and push to HuggingFace in one step (requires HF_TOKEN)
cardiomas analyze /data/ptb-xl/ --push

After analyze, the following files are written locally:

output/
└── ptb-xl/
    ├── splits.json           # train/val/test record IDs + reproducibility config
    ├── split_metadata.json   # seed, strategy, version, timestamp
    └── analysis_report.md    # LLM-generated dataset analysis

CLI Reference

`cardiomas analyze`

Analyze a dataset and save splits locally.

cardiomas analyze DATASET_SOURCE [OPTIONS]

Option	Default	Description
`--local-path PATH`		Explicit local data path (skips download)
`--output-dir PATH`	`output`	Where to save results
`--seed INT`	`42`	Reproducibility seed
`--custom-split SPEC`		e.g. `train:0.7,val:0.15,test:0.15`
`--stratify-by FIELD`		Metadata field to stratify splits by
`--ignore-official`		Ignore official splits, generate fresh
`--push`		Also push to HuggingFace (requires `HF_TOKEN`)
`--force-reanalysis`		Re-run even if already analyzed
`--use-cloud-llm`		Use cloud LLM instead of local Ollama
`--verbose` / `-v`		Stream agent reasoning and LLM calls live
`--json`		Machine-readable JSON output

Verbose output

--verbose (-v) prints every agent step and LLM prompt/response in real time instead of showing a spinner:

cardiomas analyze /data/ptb-xl/ -v

Verbose mode on — streaming agent output below.

  [orchestrator] pipeline start — source: /data/ptb-xl/
  [orchestrator] no cache hit — running full pipeline
  [discovery]    registry hit → ptb-xl
  [paper]        searching arXiv: 'ptb-xl ECG dataset electrocardiogram'
  [paper]        found 3 result(s)
  [paper]        calling LLM (ChatOllama)…
──────────────── paper — LLM call ────────────────
╭─ prompt ────────────────────────────────────────╮
│ Analyze this ECG dataset paper and extract: …  │
╰─────────────────────────────────────────────────╯
╭─ response ──────────────────────────────────────╮
│ 1. Official splits: Yes (Section 2.3, page 4)  │
╰─────────────────────────────────────────────────╯
  [analysis]     found 42 files
  [splitter]     saved splits → output/ptb-xl/splits.json
  [security]     audit PASSED — no PII, no raw data, no leakage

Each agent is color-coded. Without --verbose, only a spinner runs during the pipeline and a summary table is shown at the end.

`cardiomas push`

Push previously saved local splits to HuggingFace. Requires HF_TOKEN.

cardiomas push ptb-xl
cardiomas push ptb-xl --output-dir /my/results

Runs a security audit before uploading. Refuses to push if any check fails.

`cardiomas status`

Check if a dataset has published splits on HuggingFace.

cardiomas status ptb-xl

`cardiomas list`

cardiomas list              # show known datasets (registry)
cardiomas list --remote     # show datasets published on HuggingFace
cardiomas list --local      # show locally cached datasets

`cardiomas verify`

Re-check reproducibility metadata of published splits.

cardiomas verify ptb-xl --seed 42

`cardiomas contribute`

Submit community splits to vlbthambawita/ECGBench.

cardiomas contribute ptb-xl --split-file my_splits.json

`cardiomas config`

cardiomas config --show
cardiomas config --set OLLAMA_MODEL=mistral

HuggingFace Publishing (opt-in)

Publishing requires write access to vlbthambawita/ECGBench. Set your token before pushing:

export HF_TOKEN=hf_...
cardiomas push ptb-xl

Only record identifiers are ever published — no raw ECG signals, no patient data.

Python API

from cardiomas import CardioMAS

mas = CardioMAS(ollama_model="llama3.1:8b", seed=42)

# Analyze and save locally
result = mas.analyze("/data/ptb-xl/")
print(result["local_output_dir"])   # output/ptb-xl

# Analyze and push to HuggingFace
mas.analyze("/data/ptb-xl/", push_to_hf=True)

# Read back published splits
splits = mas.get_splits("ptb-xl")
train_ids = splits["train"]

# Custom splits
mas.analyze(
    "/data/ptb-xl/",
    custom_split={"train": 0.7, "val": 0.15, "test": 0.15},
    stratify_by="scp_codes",
    seed=123,
)

Using Different Local Models

Any model available in Ollama works. Pull a model, then point CardioMAS at it:

# Default (recommended for full pipeline)
ollama pull llama3.1:8b

# Gemma models (Google) — lighter, fast on CPU
ollama pull gemma3:4b
ollama pull gemma3:12b
ollama pull gemma3:27b

# DeepSeek Coder — best for the coding agent
ollama pull deepseek-coder:6.7b

# Use a specific model for the whole pipeline
OLLAMA_MODEL=gemma3:4b cardiomas analyze /data/ptb-xl/

# Or set it permanently in .env
echo "OLLAMA_MODEL=gemma3:4b" >> .env

Per-Agent LLM Configuration

Available in v0.2.0 (dev/v2-dynamic-orchestrator)

Each agent can use a different LLM. This is useful when you want a fast, lightweight model for simple tasks (discovery, security scan) and a more capable model for reasoning-heavy tasks (analysis, coding).

Via environment variables

# Fallback for all agents
OLLAMA_MODEL=llama3.1:8b

# Per-agent overrides (all optional)
AGENT_LLM_ORCHESTRATOR=llama3.1:8b
AGENT_LLM_NL_REQUIREMENT=gemma3:4b
AGENT_LLM_DISCOVERY=gemma3:4b
AGENT_LLM_PAPER=llama3.1:8b
AGENT_LLM_ANALYSIS=llama3.1:8b
AGENT_LLM_SPLITTER=gemma3:4b
AGENT_LLM_SECURITY=gemma3:4b
AGENT_LLM_CODER=deepseek-coder:6.7b
AGENT_LLM_PUBLISHER=gemma3:4b

Set these in .env or export them before running cardiomas.

Via CLI flags

cardiomas analyze /data/ptb-xl/ \
  --llm-coder deepseek-coder:6.7b \
  --llm-analysis llama3.1:8b \
  --llm-discovery gemma3:4b

Via Python API

from cardiomas import CardioMAS

mas = CardioMAS(
    agent_llms={
        "coder":    "deepseek-coder:6.7b",
        "analysis": "llama3.1:8b",
        "default":  "gemma3:4b",   # fallback for all other agents
    }
)
mas.analyze("/data/ptb-xl/")

Model recommendations

Agent	Recommended model	Why
`orchestrator`	`llama3.1:8b`	Reasoning-heavy routing decisions
`nl_requirement`	`gemma3:4b`	Simple parsing task
`discovery`	`gemma3:4b`	Lookup + classification
`paper`	`llama3.1:8b`	Needs to read and summarise papers
`analysis`	`llama3.1:8b` or `llama3.1:70b`	Statistical reasoning
`splitter`	`gemma3:4b`	Deterministic — LLM role is minimal
`security`	`gemma3:4b`	Pattern matching
`coder`	`deepseek-coder:6.7b`	Code generation
`publisher`	`gemma3:4b`	Structured output

Verbose LLM name display

With --verbose, each LLM call shows the model name and backend:

──────────── paper — LLM call [llama3.1:8b @ ollama] ────────────

Environment Variables

Copy .env.example to .env and fill in as needed.

Variable	Required for	Default
`OLLAMA_MODEL`	local LLM (default for all agents)	`llama3.1:8b`
`OLLAMA_BASE_URL`	local LLM	`http://localhost:11434`
`AGENT_LLM_<AGENT>`	per-agent model override	(falls back to `OLLAMA_MODEL`)
`HF_TOKEN`	`--push` / `cardiomas push`	—
`GITHUB_TOKEN`	GitHub README auto-update	—
`CARDIOMAS_SEED`	reproducibility	`42`
`CLOUD_LLM_PROVIDER`	`--use-cloud-llm`	`none`

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

vlbthambawita

These details have not been verified by PyPI

Project links

HuggingFace Dataset

Release history Release notifications | RSS feed

0.13.0

Apr 15, 2026

0.8.0

Apr 15, 2026

0.6.0

Apr 15, 2026

This version

0.5.0

Apr 15, 2026

0.3.0.dev1 pre-release

Apr 15, 2026

0.2.0

Apr 15, 2026

0.1.0

Apr 15, 2026

0.0.0

Apr 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cardiomas-0.5.0.tar.gz (97.4 kB view details)

Uploaded Apr 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cardiomas-0.5.0-py3-none-any.whl (78.1 kB view details)

Uploaded Apr 15, 2026 Python 3

File details

Details for the file cardiomas-0.5.0.tar.gz.

File metadata

Download URL: cardiomas-0.5.0.tar.gz
Upload date: Apr 15, 2026
Size: 97.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cardiomas-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`9c614209c1c02a806375312587c939c5966be454bbe112674844cc6674d9094b`
MD5	`cbe33a71f877ced423e9098c141976c7`
BLAKE2b-256	`e24e7e2699c25dc46c326f4b17b60a19ff1f85278c61c9eb6ae070efc3e2c844`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cardiomas-0.5.0.tar.gz:

Publisher: publish.yml on vlbthambawita/CardioMAS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cardiomas-0.5.0.tar.gz
- Subject digest: 9c614209c1c02a806375312587c939c5966be454bbe112674844cc6674d9094b
- Sigstore transparency entry: 1305483919
- Sigstore integration time: Apr 15, 2026
Source repository:
- Permalink: vlbthambawita/CardioMAS@ca03ede11dca83db285995f05d598484aad66948
- Branch / Tag: refs/tags/v0.5.0
- Owner: https://github.com/vlbthambawita
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ca03ede11dca83db285995f05d598484aad66948
- Trigger Event: push

File details

Details for the file cardiomas-0.5.0-py3-none-any.whl.

File metadata

Download URL: cardiomas-0.5.0-py3-none-any.whl
Upload date: Apr 15, 2026
Size: 78.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cardiomas-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a16f40d7e9968ddb78644fab798797256dedcc051d99a4925d5a901f6b62a6a9`
MD5	`4aeee05d36db9fd09d218f92f5ba3d1e`
BLAKE2b-256	`1856cce11954f532a5198ba57bde30ba57e9a94dd383a4a475939033aa2c75b5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cardiomas-0.5.0-py3-none-any.whl:

Publisher: publish.yml on vlbthambawita/CardioMAS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cardiomas-0.5.0-py3-none-any.whl
- Subject digest: a16f40d7e9968ddb78644fab798797256dedcc051d99a4925d5a901f6b62a6a9
- Sigstore transparency entry: 1305484007
- Sigstore integration time: Apr 15, 2026
Source repository:
- Permalink: vlbthambawita/CardioMAS@ca03ede11dca83db285995f05d598484aad66948
- Branch / Tag: refs/tags/v0.5.0
- Owner: https://github.com/vlbthambawita
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ca03ede11dca83db285995f05d598484aad66948
- Trigger Event: push

cardiomas 0.5.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CardioMAS — Cardio Multi-Agent System

Architecture

Requirements

Quick Start

CLI Reference

cardiomas analyze

Verbose output

cardiomas push

cardiomas status

cardiomas list

cardiomas verify

cardiomas contribute

cardiomas config

HuggingFace Publishing (opt-in)

Python API

Using Different Local Models

Per-Agent LLM Configuration

Via environment variables

Via CLI flags

Via Python API

Model recommendations

Verbose LLM name display

Environment Variables

Links

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`cardiomas analyze`

`cardiomas push`

`cardiomas status`

`cardiomas list`

`cardiomas verify`

`cardiomas contribute`

`cardiomas config`