CardioMAS — Cardio Multi-Agent System for reproducible ECG dataset splits

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

vlbthambawita

These details have not been verified by PyPI

Project links

HuggingFace Dataset

Project description

CardioMAS — Cardio Multi-Agent System

A locally-runnable multi-agent system that analyzes ECG datasets and generates reproducible train/validation/test splits. Outputs are saved locally by default. Publishing to vlbthambawita/ECGBench on HuggingFace is an explicit opt-in step that requires write access.

Architecture

flowchart TD
    CLI(["`**CLI / Python API**
    cardiomas analyze`"])
    CLI --> ORCH

    subgraph PIPELINE ["LangGraph Pipeline — V2 Hub-and-Spoke"]
        ORCH(["🎯 **Orchestrator**
        Dynamic routing hub
        sets next_agent each turn"])

        %% Worker agents (spokes)
        NL["📝 NL Requirement
        Parse natural-language
        user requirements"]

        DISC["🔍 Discovery
        Identify dataset type,
        source & metadata"]

        PAPER["📄 Paper
        Find & parse paper
        Extract split methodology"]

        ANALYSIS["📊 Analysis
        Scan files, parse CSV
        Compute statistics"]

        SPLIT["✂️ Splitter
        SHA-256 seeded
        deterministic splits"]

        SEC["🔒 Security
        PII scan · raw-data
        Patient leakage check"]

        CODER["💻 Coder
        Generate custom
        split code"]

        PUB["☁️ Publisher
        Push to HF
        Update GitHub README"]

        %% Orchestrator → workers (dispatch)
        ORCH -->|dispatch| NL
        ORCH -->|dispatch| DISC
        ORCH -->|dispatch| PAPER
        ORCH -->|dispatch| ANALYSIS
        ORCH -->|dispatch| SPLIT
        ORCH -->|dispatch| SEC
        ORCH -->|dispatch| CODER
        ORCH -->|dispatch| PUB

        %% Workers → Orchestrator (return)
        NL -->|done| ORCH
        DISC -->|done| ORCH
        PAPER -->|done| ORCH
        ANALYSIS -->|done| ORCH
        SPLIT -->|done| ORCH
        SEC -->|done| ORCH
        CODER -->|done| ORCH
        PUB -->|done| ORCH

        %% Terminal routes from orchestrator
        ORCH -->|cache hit| EXISTING["↩ Return existing\nsplits from HF"]
        ORCH -->|audit failed| ERR["❌ End with error\nBlocked — not saved"]
        ORCH -->|pipeline done| SAVED["💾 End — saved locally"]
    end

    PUB -.->|if --push| HF[("HuggingFace
    vlbthambawita/ECGBench")]
    SAVED --> OUT

    OUT["`**output/&lt;dataset&gt;/**
    splits.json
    split_metadata.json
    analysis_report.md`"]

    style PIPELINE fill:#1a1a2e,stroke:#4a9eff,color:#fff
    style ORCH fill:#0f3460,stroke:#4a9eff,color:#fff,shape:stadium
    style NL fill:#2d1b4e,stroke:#9b59b6,color:#fff
    style DISC fill:#1b4332,stroke:#40916c,color:#fff
    style PAPER fill:#3b1f5e,stroke:#9b59b6,color:#fff
    style ANALYSIS fill:#7b4220,stroke:#e67e22,color:#fff
    style SPLIT fill:#1a5276,stroke:#2e86c1,color:#fff
    style SEC fill:#641e16,stroke:#e74c3c,color:#fff
    style CODER fill:#1a3a4a,stroke:#00bcd4,color:#fff
    style PUB fill:#145a32,stroke:#27ae60,color:#fff
    style SAVED fill:#1a3a1a,stroke:#27ae60,color:#fff
    style ERR fill:#3b0f0f,stroke:#e74c3c,color:#fff
    style HF fill:#2d2d2d,stroke:#f5a623,color:#fff
    style OUT fill:#2d2d2d,stroke:#4a9eff,color:#fff
    style EXISTING fill:#2d2d2d,stroke:#4a9eff,color:#fff
    style CLI fill:#0d0d0d,stroke:#4a9eff,color:#fff

Each node is a dedicated LLM-backed agent. The orchestrator is the central hub — it dynamically decides which agent to invoke next after each agent completes (hub-and-spoke pattern). Every worker agent returns to the orchestrator after finishing. Agents communicate only through the shared GraphState. The security agent is a hard gate — publishing is blocked if any check fails.

Requirements

Python ≥ 3.10
Ollama running locally with a model pulled (default: gemma4:e2b)

ollama pull gemma4:e2b
ollama serve
pip install cardiomas

Quick Start

# Analyze a dataset — results saved to ./output/ptb-xl/
cardiomas analyze https://physionet.org/content/ptb-xl/1.0.3/

# Use a local directory
cardiomas analyze /data/ptb-xl/

# Stream all agent reasoning live
cardiomas analyze /data/ptb-xl/ --verbose

# Analyze and push to HuggingFace in one step (requires HF_TOKEN)
cardiomas analyze /data/ptb-xl/ --push

After analyze, the following files are written locally:

output/
└── ptb-xl/
    ├── splits.json           # train/val/test record IDs + reproducibility config
    ├── split_metadata.json   # seed, strategy, version, timestamp
    └── analysis_report.md    # LLM-generated dataset analysis

CLI Reference

`cardiomas analyze`

Analyze a dataset and save splits locally.

cardiomas analyze DATASET_SOURCE [OPTIONS]

Option	Default	Description
`--local-path PATH`		Explicit local data path (skips download)
`--output-dir PATH`	`output`	Where to save results
`--seed INT`	`42`	Reproducibility seed
`--custom-split SPEC`		e.g. `train:0.7,val:0.15,test:0.15`
`--stratify-by FIELD`		Metadata field to stratify splits by
`--ignore-official`		Ignore official splits, generate fresh
`--push`		Also push to HuggingFace (requires `HF_TOKEN`)
`--force-reanalysis`		Re-run even if already analyzed
`--use-cloud-llm`		Use cloud LLM instead of local Ollama
`--verbose` / `-v`		Stream agent reasoning and LLM calls live
`--json`		Machine-readable JSON output

Verbose output

--verbose (-v) prints every agent step and LLM prompt/response in real time instead of showing a spinner:

cardiomas analyze /data/ptb-xl/ -v

Verbose mode on — streaming agent output below.

  [orchestrator] pipeline start — source: /data/ptb-xl/
  [orchestrator] no cache hit — running full pipeline
  [discovery]    registry hit → ptb-xl
  [paper]        searching arXiv: 'ptb-xl ECG dataset electrocardiogram'
  [paper]        found 3 result(s)
  [paper]        calling LLM (ChatOllama)…
──────────────── paper — LLM call ────────────────
╭─ prompt ────────────────────────────────────────╮
│ Analyze this ECG dataset paper and extract: …  │
╰─────────────────────────────────────────────────╯
╭─ response ──────────────────────────────────────╮
│ 1. Official splits: Yes (Section 2.3, page 4)  │
╰─────────────────────────────────────────────────╯
  [analysis]     found 42 files
  [splitter]     saved splits → output/ptb-xl/splits.json
  [security]     audit PASSED — no PII, no raw data, no leakage

Each agent is color-coded. Without --verbose, only a spinner runs during the pipeline and a summary table is shown at the end.

`cardiomas push`

Push previously saved local splits to HuggingFace. Requires HF_TOKEN.

cardiomas push ptb-xl
cardiomas push ptb-xl --output-dir /my/results

Runs a security audit before uploading. Refuses to push if any check fails.

`cardiomas status`

Check if a dataset has published splits on HuggingFace.

cardiomas status ptb-xl

`cardiomas list`

cardiomas list              # show known datasets (registry)
cardiomas list --remote     # show datasets published on HuggingFace
cardiomas list --local      # show locally cached datasets

`cardiomas verify`

Re-check reproducibility metadata of published splits.

cardiomas verify ptb-xl --seed 42

`cardiomas contribute`

Submit community splits to vlbthambawita/ECGBench.

cardiomas contribute ptb-xl --split-file my_splits.json

`cardiomas config`

cardiomas config --show
cardiomas config --set OLLAMA_MODEL=mistral

HuggingFace Publishing (opt-in)

Publishing requires write access to vlbthambawita/ECGBench. Set your token before pushing:

export HF_TOKEN=hf_...
cardiomas push ptb-xl

Only record identifiers are ever published — no raw ECG signals, no patient data.

Python API

from cardiomas import CardioMAS

mas = CardioMAS(ollama_model="gemma4:e2b", seed=42)

# Analyze and save locally
result = mas.analyze("/data/ptb-xl/")
print(result["local_output_dir"])   # output/ptb-xl

# Analyze and push to HuggingFace
mas.analyze("/data/ptb-xl/", push_to_hf=True)

# Read back published splits
splits = mas.get_splits("ptb-xl")
train_ids = splits["train"]

# Custom splits
mas.analyze(
    "/data/ptb-xl/",
    custom_split={"train": 0.7, "val": 0.15, "test": 0.15},
    stratify_by="scp_codes",
    seed=123,
)

Using Different Local Models

Any model available in Ollama works. Pull a model, then point CardioMAS at it:

# Default (recommended for full pipeline)
ollama pull gemma4:e2b

# Larger Gemma 4 variants for heavier reasoning tasks
ollama pull gemma4:e4b
ollama pull gemma4:27b

# DeepSeek Coder — best for the coding agent
ollama pull deepseek-coder:6.7b

# Use a specific model for the whole pipeline
OLLAMA_MODEL=gemma4:e2b cardiomas analyze /data/ptb-xl/

# Or set it permanently in .env
echo "OLLAMA_MODEL=gemma4:e2b" >> .env

Per-Agent LLM Configuration

Available in v0.2.0 (dev/v2-dynamic-orchestrator)

Each agent can use a different LLM. This is useful when you want a fast, lightweight model for simple tasks (discovery, security scan) and a more capable model for reasoning-heavy tasks (analysis, coding).

Via environment variables

# Fallback for all agents
OLLAMA_MODEL=gemma4:e2b

# Per-agent overrides (all optional)
AGENT_LLM_ORCHESTRATOR=gemma4:e2b
AGENT_LLM_NL_REQUIREMENT=gemma4:e2b
AGENT_LLM_DISCOVERY=gemma4:e2b
AGENT_LLM_PAPER=gemma4:e2b
AGENT_LLM_ANALYSIS=gemma4:e2b
AGENT_LLM_SPLITTER=gemma4:e2b
AGENT_LLM_SECURITY=gemma4:e2b
AGENT_LLM_CODER=deepseek-coder:6.7b
AGENT_LLM_PUBLISHER=gemma4:e2b

Set these in .env or export them before running cardiomas.

Via CLI flags

cardiomas analyze /data/ptb-xl/ \
  --llm-coder deepseek-coder:6.7b \
  --llm-analysis gemma4:e4b \
  --llm-discovery gemma4:e2b

Via Python API

from cardiomas import CardioMAS

mas = CardioMAS(
    agent_llms={
        "coder":    "deepseek-coder:6.7b",
        "analysis": "gemma4:e4b",
        "default":  "gemma4:e2b",   # fallback for all other agents
    }
)
mas.analyze("/data/ptb-xl/")

Model recommendations

Agent	Recommended model	Why
`orchestrator`	`gemma4:e2b`	Default — dynamic routing decisions
`nl_requirement`	`gemma4:e2b`	Simple parsing task
`discovery`	`gemma4:e2b`	Lookup + classification
`paper`	`gemma4:e2b` or `gemma4:e4b`	Needs to read and summarise papers
`analysis`	`gemma4:e2b` or `gemma4:e4b`	Statistical reasoning
`splitter`	`gemma4:e2b`	Deterministic — LLM role is minimal
`security`	`gemma4:e2b`	Pattern matching
`coder`	`deepseek-coder:6.7b`	Code generation
`publisher`	`gemma4:e2b`	Structured output

Verbose LLM name display

With --verbose, each LLM call shows the model name and backend:

──────────── paper — LLM call [gemma4:e2b @ ollama] ────────────

Environment Variables

Copy .env.example to .env and fill in as needed.

Variable	Required for	Default
`OLLAMA_MODEL`	local LLM (default for all agents)	`gemma4:e2b`
`OLLAMA_BASE_URL`	local LLM	`http://localhost:11434`
`AGENT_LLM_<AGENT>`	per-agent model override	(falls back to `OLLAMA_MODEL`)
`HF_TOKEN`	`--push` / `cardiomas push`	—
`GITHUB_TOKEN`	GitHub README auto-update	—
`CARDIOMAS_SEED`	reproducibility	`42`
`CLOUD_LLM_PROVIDER`	`--use-cloud-llm`	`none`

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

vlbthambawita

These details have not been verified by PyPI

Project links

HuggingFace Dataset

Release history Release notifications | RSS feed

0.13.0

Apr 15, 2026

This version

0.8.0

Apr 15, 2026

0.6.0

Apr 15, 2026

0.5.0

Apr 15, 2026

0.3.0.dev1 pre-release

Apr 15, 2026

0.2.0

Apr 15, 2026

0.1.0

Apr 15, 2026

0.0.0

Apr 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cardiomas-0.8.0.tar.gz (124.8 kB view details)

Uploaded Apr 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cardiomas-0.8.0-py3-none-any.whl (103.2 kB view details)

Uploaded Apr 15, 2026 Python 3

File details

Details for the file cardiomas-0.8.0.tar.gz.

File metadata

Download URL: cardiomas-0.8.0.tar.gz
Upload date: Apr 15, 2026
Size: 124.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cardiomas-0.8.0.tar.gz
Algorithm	Hash digest
SHA256	`18f4cb990b39d59e9f5077f7cc4713fccdd9d04a66142684c14330713a5c15a7`
MD5	`ab39ae8be6c58f411ef41da4cb39a4bb`
BLAKE2b-256	`0e40f2dcfe0b20768a30e06453e14d65a9287447b26461ca2ef9034c995f845d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cardiomas-0.8.0.tar.gz:

Publisher: publish.yml on vlbthambawita/CardioMAS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cardiomas-0.8.0.tar.gz
- Subject digest: 18f4cb990b39d59e9f5077f7cc4713fccdd9d04a66142684c14330713a5c15a7
- Sigstore transparency entry: 1308756242
- Sigstore integration time: Apr 15, 2026
Source repository:
- Permalink: vlbthambawita/CardioMAS@c7b691e499b4815eb495f1b9c8bd43b867076032
- Branch / Tag: refs/tags/v0.8.0
- Owner: https://github.com/vlbthambawita
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c7b691e499b4815eb495f1b9c8bd43b867076032
- Trigger Event: push

File details

Details for the file cardiomas-0.8.0-py3-none-any.whl.

File metadata

Download URL: cardiomas-0.8.0-py3-none-any.whl
Upload date: Apr 15, 2026
Size: 103.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cardiomas-0.8.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b615ae520a8f8b4f45cbe1eea35acfc912989a3c3294435c782b6bf425252c0e`
MD5	`1bfb7a3f65cd5919c45bbe3cc54102ff`
BLAKE2b-256	`a19dcfbe48c71828b38805f68daba5c9a41044337ebcd37968474a2a75c5d5f4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cardiomas-0.8.0-py3-none-any.whl:

Publisher: publish.yml on vlbthambawita/CardioMAS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cardiomas-0.8.0-py3-none-any.whl
- Subject digest: b615ae520a8f8b4f45cbe1eea35acfc912989a3c3294435c782b6bf425252c0e
- Sigstore transparency entry: 1308756611
- Sigstore integration time: Apr 15, 2026
Source repository:
- Permalink: vlbthambawita/CardioMAS@c7b691e499b4815eb495f1b9c8bd43b867076032
- Branch / Tag: refs/tags/v0.8.0
- Owner: https://github.com/vlbthambawita
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c7b691e499b4815eb495f1b9c8bd43b867076032
- Trigger Event: push

cardiomas 0.8.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CardioMAS — Cardio Multi-Agent System

Architecture

Requirements

Quick Start

CLI Reference

cardiomas analyze

Verbose output

cardiomas push

cardiomas status

cardiomas list

cardiomas verify

cardiomas contribute

cardiomas config

HuggingFace Publishing (opt-in)

Python API

Using Different Local Models

Per-Agent LLM Configuration

Via environment variables

Via CLI flags

Via Python API

Model recommendations

Verbose LLM name display

Environment Variables

Links

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`cardiomas analyze`

`cardiomas push`

`cardiomas status`

`cardiomas list`

`cardiomas verify`

`cardiomas contribute`

`cardiomas config`