Skip to main content

STEM BIO-AI deterministic evidence-surface scanner for bio/medical AI repositories

Project description

STEM BIO-AI

STEM BIO-AI logo

Deterministic evidence-surface scanner for bio/medical AI repositories.
No LLM. No API key. No model runtime. No secrets sent anywhere.

CI v1.5.10 Python 3.9+ PyPI Apache 2.0 HF Space


Why STEM BIO-AI

Bio and medical AI repositories vary enormously in evidence quality — from rigorous academic tools to marketing-grade demos that carry clinical language with no data provenance, no reproducibility path, and no clinical-use disclaimer. Manual review is slow and inconsistent.

STEM BIO-AI scans the observable repository surface — README, docs, code structure, CI configuration, dependency manifests, changelogs — and maps detected signals to a structured evidence tier (T0–T4). The scan runs in seconds on a local clone, produces machine-readable JSON and PDF reports, and makes every scoring decision traceable to a specific file, line, and pattern.

A T4 score means strong observable evidence signals. It does not mean the repository is safe for clinical deployment — that requires independent expert validation.


Quick Start

git clone https://github.com/flamehaven01/STEM-BIO-AI.git
cd STEM-BIO-AI
pip install stem-ai
# editable local install with PDF output support
pip install -e .[pdf]

# 1-page executive dashboard
stem /path/to/bio-ai-repo --level 1 --format all

# 3-page stage analysis
stem /path/to/bio-ai-repo --level 2 --format all

# 5-page full evidence packet with proof trace
stem /path/to/bio-ai-repo --level 3 --format all --explain
# Advisory mode (no provider API call required)
stem /path/to/bio-ai-repo --advisory validate
stem /path/to/bio-ai-repo --advisory packet
stem /path/to/bio-ai-repo --advisory call
stem /path/to/bio-ai-repo --advisory-response provider_advisory.json

Clone the target repository first; the CLI operates on local paths only.

Proof surfaces


Triage Tiers

Tier Score Recommended Action
T0 Rejected 0 – 39 Insufficient evidence — do not rely on without independent expert validation
T1 Quarantine 40 – 54 Exploratory review only — expert validation required before any use
T2 Caution 55 – 69 Research reference and supervised non-clinical technical review only
T3 Supervised 70 – 84 Supervised institutional review candidate
T4 Candidate 85 – 100 Strong evidence posture — clinical deployment still requires independent validation

Clinical-adjacent repositories without an explicit disclaimer are hard-capped at T2 (score ≤ 69). Repositories with unbounded CA-DIRECT claims are hard-capped at T0 (score ≤ 39).

Tier boundary derivation and calibration gap disclosures: docs/SCORING_RATIONALE.md.


Scoring Model

Final = (Stage 1 × 0.40) + (Stage 2R × 0.20) + (Stage 3 × 0.40) − C1 Penalty
Stage Weight What Is Measured
Stage 1 README Evidence 40% Bio-domain vocabulary; H1–H6 hype-claim penalties; R1–R5 responsibility signals (limitations, regulatory framing, clinical disclaimer, demographic-bias, reproducibility)
Stage 2R Repo-Local Consistency 20% Vocabulary overlap across README, docs, package metadata, CI, and tests; limitation repetition; contradiction, staleness, and unsupported-workflow deductions
Stage 3 Code/Bio Responsibility 40% CI presence; domain test coverage; changelog hygiene (T3); data provenance and IRB/dataset citation (B1); bias/limitation measurement evidence (B2); conflict-of-interest disclosure (B3)
Stage 4 Replication Evidence Separate lane Containers; reproducibility targets; dependency locks/pins; dataset and model artifact references; seed, CLI, and citation signals; license/use-scope restrictions
C1–C4 Code Integrity Penalty / advisory Hardcoded credentials (C1, −10 pts); dependency pinning (C2); deprecated patient-adjacent paths (C3); fail-open exception handlers (C4)

Stage 4 is reported as replication_score / replication_tier and does not affect score.final_score. Full scoring rationale and calibration gap disclosures are in docs/SCORING_RATIONALE.md.


Architecture

flowchart LR
    A[Target repository] --> B[LOCAL_ANALYSIS scanner]
    B --> C[Stage 1\nREADME evidence]
    B --> D[Stage 2R\nRepo-local consistency]
    B --> E[Stage 3\nCode/bio responsibility]
    B --> F[Stage 4\nReplication lane]
    B --> K[C1–C4\nCode integrity]
    C --> G[Weighted evidence score]
    D --> G
    E --> G
    K --> G
    F --> H[replication_score / tier]
    G --> I[JSON result]
    H --> I
    I --> L[Evidence ledger]
    I --> M[Explain trace]
    G --> N[Markdown report]
    G --> O[PDF packet]

Core modules: stem_ai/scanner.py, stem_ai/render.py, stem_ai/cli.py, stem_ai/detectors.py, stem_ai/detector_surface.py, stem_ai/detector_ast.py, stem_ai/detector_bio.py, stem_ai/detector_stage4.py, stem_ai/evidence.py, stem_ai/app.py


Output Artifacts

Each run writes to --out DIR (default: stem_output/).

Level Pages Audience Artifacts
--level 1 1 Executive / triage Score, tier, stage cards, code integrity summary
--level 2 3 Standard audit review Level 1 + Stage 1/2R/3 breakdown, gap analysis
--level 3 5 Full evidence packet Level 2 + code integrity deep dive, classification analysis, remediation roadmap
<repo>_experiment_results.json   # machine-readable score + full evidence object
<repo>_report.md                 # human-readable audit report
<repo>_brief_1p.pdf              # Level 1 executive dashboard
<repo>_detailed_3p.pdf           # Level 2 stage analysis
<repo>_detailed_5p.pdf           # Level 3 deep review packet
<repo>_explain.txt               # --explain: file/line/snippet proof trace

Report Preview

STEM BIO-AI Level 3 report — page 1

Sample PDF: Download the 5-page Level 3 report

View all 5 preview pages
Page 1 Page 2
Page 1 Page 2
Page 3 Page 4
Page 3 Page 4
Page 5
Page 5

Detection Methods

Every scored item maps to a concrete, inspectable detection method. No inference, no LLM judgment.

Full detection table
Component Detection Method
Stage 1 baseline Non-zero README present (+60 base)
Stage 1 domain signal Bio-domain keyword regex in README and package metadata
Stage 1 hype penalties (H1–H6) Regex: clinical certainty, regulatory approval, autonomous replacement, breakthrough marketing, universal generalization, perfect accuracy claims
Stage 1 responsibility signals (R1–R5) Regex: limitations section, regulatory framework, clinical disclaimer (CA-severity-weighted), demographic-bias disclosure, reproducibility provisions
Stage 2R consistency Vocabulary set intersection across README/docs/package/tests; limitation repetition; clinical-boundary contradiction, version-staleness, and workflow-support deductions
Stage 3 T1 CI .github/workflows/ contains at least one file
Stage 3 T2 domain tests tests/ directory text contains bio-domain vocabulary (regex)
Stage 3 T3 changelog CHANGELOG file presence + bug-fix/patch/security entry detection (3-tier: 0/+5/+15)
Stage 3 B1 data provenance Dependency manifest presence + IRB/dataset-citation language detection (3-tier: 0/+10/+15)
Stage 3 B2 bias measurement Bias/limitations vocabulary + quantitative measurement evidence (subgroup analysis, AUROC, demographic parity) (3-tier: 0/+8/+15)
Stage 3 B3 COI/funding Funding, grant, sponsor, conflict-of-interest language in README/docs/FUNDING.md
Stage 4 containers Dockerfile or compose file present
Stage 4 reproducibility target Makefile with reproduce/eval/benchmark/test targets
Stage 4 dependency lock Environment/lock/requirements file; exact pins or hash evidence
Stage 4 artifact references Dataset/model/checkpoint URLs or checksum files
Stage 4 citation/interface CITATION.cff; argparse CLI entry points (AST)
Stage 4 license restriction Non-commercial, research-only, academic-only, no-clinical-use restrictions in LICENSE/README
CA severity Clinical/diagnostic phrase regex in README, docs, and package metadata
C1 credentials AWS AKIA*, OpenAI sk-*, GitHub ghp_*, api_key=... patterns; obvious placeholders excluded from penalty
C2 dependency pinning == or hash pin vs. loose >=, ~=, <, > ranges
C3 deprecated paths Patient-metadata patterns in deprecated/, legacy/, archive/ directories
C4 fail-open except Exception: pass or except: pass in Python source (AST)

AI Advisory Contract

The advisory system exports a sanitized, provider-neutral handoff packet and validates provider responses — without making any provider API call.

stem /path/to/repo --advisory validate          # offline contract check
stem /path/to/repo --advisory packet            # export sanitized input packet
stem /path/to/repo --advisory-response FILE     # validate provider JSON response

Non-negotiable rules (enforced by the validator):

  • Provider output cannot override score.final_score or score.formal_tier
  • Every advisory item must cite exact finding_id strings from allowed_finding_ids
  • Raw repository source text is not included in provider packets
  • Responses containing clinical safety, efficacy, regulatory, or medical-advice claims are rejected
  • allowed_finding_ids is capped at 40 entries per packet

Packet hardening added in v1.5.7:

  • provider_request now carries a secret-free request schema plus deterministic argument-validation status
  • contract_schemas exports the advisory input/output contract shapes for downstream validators
  • packet_contract confirms allowlist parity, snippet omission, and non-negative omission counts before handoff

Secret boundary hardening added in v1.5.9:

  • provider-specific environment variables are recognized before the generic advisory key fallback
  • provider handoff metadata exports endpoint-policy validation and the expected env-var name, never the key value
  • embedded-credential URLs are rejected; cloud providers require https; plain http is limited to localhost
  • .env files are ignored by default; .env.example documents supported variable names only
  • --advisory call is now the explicit provider-call boundary, with centralized redaction, logging-policy export, child-env allowlist reporting, and artifact pre-write sanitization

Full contract: docs/API_CONTRACT.md Secret policy: docs/ADVISORY_SECRET_HANDLING.md Runtime boundary: docs/ADVISORY_RUNTIME.md


MICA Memory Layer

The repository keeps a versioned MICA memory layer under memory/ for agent-session initialization, drift control, and release provenance. Historical snapshots are retained as archive; the active layer is selected by memory/mica.yaml.

Operational reference: docs/MICA_MEMORY.md


Web Demo

Live demo: huggingface.co/spaces/Flamehaven/stem-bio-ai

STEM BIO-AI Hugging Face Space

The Space runs the same deterministic local scanner on public GitHub repositories. No provider API call is made.

Run locally:

pip install -e .[demo]
python app.py

Repository Structure

STEM-BIO-AI/
  stem_ai/              # Core Python package
  docs/                 # API contract, advisory runtime/secret policy, scoring rationale, MICA policy, report previews
  memory/               # Versioned MICA archive/playbook/lessons; active layer selected by mica.yaml
  audits/               # Reference benchmark artifacts
  scripts/              # Benchmark and validation scripts
  tests/                # Regression test suite
  app.py                # HuggingFace Spaces / Gradio entry point
  pyproject.toml        # Package metadata and extras
  SKILL.md              # Universal agent skill definition
  CHANGELOG.md          # Version history

Agent Skill Install

# Claude Code
git clone --depth 1 https://github.com/flamehaven01/STEM-BIO-AI.git ~/.claude/skills/stem-bio-ai

# Generic agent frameworks
git clone --depth 1 https://github.com/flamehaven01/STEM-BIO-AI.git ~/.agents/skills/stem-bio-ai

Contributing

See CONTRIBUTING.md. High-value areas: rubric discrimination examples, clinical-adjacency trigger refinements, additional bio-domain benchmark repositories, report rendering improvements.


Citation

@software{stem-bio-ai,
  author  = {Yun, Kwansub},
  title   = {STEM BIO-AI: Deterministic Evidence-Surface Scanner for Bio/Medical AI Repositories},
  version = {1.5.10},
  year    = {2026},
  url     = {https://github.com/flamehaven01/STEM-BIO-AI}
}

License

Apache 2.0. See LICENSE.

Maintained by flamehaven01

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stem_ai-1.5.10.tar.gz (140.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stem_ai-1.5.10-py3-none-any.whl (159.9 kB view details)

Uploaded Python 3

File details

Details for the file stem_ai-1.5.10.tar.gz.

File metadata

  • Download URL: stem_ai-1.5.10.tar.gz
  • Upload date:
  • Size: 140.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for stem_ai-1.5.10.tar.gz
Algorithm Hash digest
SHA256 1400ab824e1fbf1f99c5c12cadc2d26dd3863fa1b3822843efa56b9ddc415cc6
MD5 221dd7ce0388479b4a750ca8548a7fa0
BLAKE2b-256 29e2a87d91a5e8d0cd79025ae5f1d098b866c88ccbae1f6b47aca35db47d3804

See more details on using hashes here.

Provenance

The following attestation bundles were made for stem_ai-1.5.10.tar.gz:

Publisher: publish.yml on flamehaven01/STEM-BIO-AI

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file stem_ai-1.5.10-py3-none-any.whl.

File metadata

  • Download URL: stem_ai-1.5.10-py3-none-any.whl
  • Upload date:
  • Size: 159.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for stem_ai-1.5.10-py3-none-any.whl
Algorithm Hash digest
SHA256 a4e854afc82147f042acf3c64fa50d8bf10a7340066cd1245e8a36e75b76205e
MD5 c60e6fa6498bf42ac186da8a6b496e1f
BLAKE2b-256 bfc27bca3c81daad024f815be70ca19b83d43f41e4306587a53fdf24be9f7830

See more details on using hashes here.

Provenance

The following attestation bundles were made for stem_ai-1.5.10-py3-none-any.whl:

Publisher: publish.yml on flamehaven01/STEM-BIO-AI

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page