STEM BIO-AI deterministic evidence-surface scanner for bio/medical AI repositories
Project description
STEM BIO-AI
Deterministic evidence-surface scanner for bio/medical AI repositories.
No LLM. No API key. No model runtime. No secrets sent anywhere.
Why STEM BIO-AI
Bio and medical AI repositories vary enormously in evidence quality — from rigorous academic tools to marketing-grade demos that carry clinical language with no data provenance, no reproducibility path, and no clinical-use disclaimer. Manual review is slow and inconsistent.
STEM BIO-AI scans the observable repository surface — README, docs, code structure, CI configuration, dependency manifests, changelogs — and maps detected signals to a structured evidence tier (T0–T4). The scan runs in seconds on a local clone, produces machine-readable JSON and PDF reports, and makes every scoring decision traceable to a specific file, line, and pattern.
A T4 score means strong observable evidence signals. It does not mean the repository is safe for clinical deployment — that requires independent expert validation.
Quick Start
git clone https://github.com/flamehaven01/STEM-BIO-AI.git
cd STEM-BIO-AI
pip install stem-ai
# editable local install with PDF output support
pip install -e .[pdf]
# 1-page executive dashboard
stem /path/to/bio-ai-repo --level 1 --format all
# 3-page stage analysis
stem /path/to/bio-ai-repo --level 2 --format all
# 5-page full evidence packet with proof trace
stem /path/to/bio-ai-repo --level 3 --format all --explain
# Advisory mode (no provider API call required)
stem /path/to/bio-ai-repo --advisory validate
stem /path/to/bio-ai-repo --advisory packet
stem /path/to/bio-ai-repo --advisory call
stem /path/to/bio-ai-repo --advisory-response provider_advisory.json
Clone the target repository first; the CLI operates on local paths only.
Proof surfaces
- Demo: Hugging Face Space
- API contract:
docs/API_CONTRACT.md - Secret handling:
docs/ADVISORY_SECRET_HANDLING.md - Advisory runtime boundary:
docs/ADVISORY_RUNTIME.md - Example audits:
docs/EXAMPLE_AUDITS.md - Scoring rationale:
docs/SCORING_RATIONALE.md - Deterministic diagnostics:
docs/DETERMINISTIC_DIAGNOSTICS.md - Regulatory traceability mapping:
docs/REGULATORY_MAPPING.md - Regulatory basis registry:
docs/regulatory_basis_registry.v1.json
Triage Tiers
| Tier | Score | Recommended Action |
|---|---|---|
| T0 Rejected | 0 – 39 | Insufficient evidence — do not rely on without independent expert validation |
| T1 Quarantine | 40 – 54 | Exploratory review only — expert validation required before any use |
| T2 Caution | 55 – 69 | Research reference and supervised non-clinical technical review only |
| T3 Supervised | 70 – 84 | Supervised institutional review candidate |
| T4 Candidate | 85 – 100 | Strong evidence posture — clinical deployment still requires independent validation |
Clinical-adjacent repositories without an explicit disclaimer are hard-capped at T2 (score ≤ 69). Repositories with unbounded CA-DIRECT claims are hard-capped at T0 (score ≤ 39).
Tier boundary derivation and calibration gap disclosures: docs/SCORING_RATIONALE.md.
Scoring Model
Final = (Stage 1 × 0.40) + (Stage 2R × 0.20) + (Stage 3 × 0.40) − C1 Penalty
| Stage | Weight | What Is Measured |
|---|---|---|
| Stage 1 README Evidence | 40% | Bio-domain vocabulary; H1–H6 hype-claim penalties; R1–R5 responsibility signals (limitations, regulatory framing, clinical disclaimer, demographic-bias, reproducibility) |
| Stage 2R Repo-Local Consistency | 20% | Vocabulary overlap across README, docs, package metadata, CI, and tests; limitation repetition; contradiction, staleness, and unsupported-workflow deductions |
| Stage 3 Code/Bio Responsibility | 40% | CI presence; domain test coverage; changelog hygiene (T3); data provenance and IRB/dataset citation (B1); bias/limitation measurement evidence (B2); conflict-of-interest disclosure (B3) |
| Stage 4 Replication Evidence | Separate lane | Containers; reproducibility targets; dependency locks/pins; dataset and model artifact references; seed, CLI, and citation signals; license/use-scope restrictions |
| C1–C4 Code Integrity | Penalty / advisory | Hardcoded credentials (C1, −10 pts); dependency pinning (C2); deprecated patient-adjacent paths (C3); fail-open exception handlers (C4) |
Stage 4 is reported as replication_score / replication_tier and does not affect score.final_score. Full scoring rationale and calibration gap disclosures are in docs/SCORING_RATIONALE.md.
Architecture
flowchart LR
A[Target repository] --> B[LOCAL_ANALYSIS scanner]
B --> C[Stage 1\nREADME evidence]
B --> D[Stage 2R\nRepo-local consistency]
B --> E[Stage 3\nCode/bio responsibility]
B --> F[Stage 4\nReplication lane]
B --> K[C1–C4\nCode integrity]
C --> G[Weighted evidence score]
D --> G
E --> G
K --> G
F --> H[replication_score / tier]
G --> I[JSON result]
H --> I
I --> L[Evidence ledger]
I --> M[Explain trace]
G --> N[Markdown report]
G --> O[PDF packet]
Core modules: stem_ai/scanner.py, stem_ai/render.py, stem_ai/cli.py, stem_ai/detectors.py, stem_ai/detector_surface.py, stem_ai/detector_ast.py, stem_ai/detector_bio.py, stem_ai/detector_stage4.py, stem_ai/evidence.py, stem_ai/app.py
Output Artifacts
Each run writes to --out DIR (default: stem_output/).
| Level | Pages | Audience | Artifacts |
|---|---|---|---|
--level 1 |
1 | Executive / triage | Score, tier, stage cards, code integrity summary |
--level 2 |
3 | Standard audit review | Level 1 + Stage 1/2R/3 breakdown, gap analysis |
--level 3 |
5 | Full evidence packet | Level 2 + code integrity deep dive, classification analysis, remediation roadmap |
<repo>_experiment_results.json # machine-readable score + full evidence object
<repo>_report.md # human-readable audit report
<repo>_brief_1p.pdf # Level 1 executive dashboard
<repo>_detailed_3p.pdf # Level 2 stage analysis
<repo>_detailed_5p.pdf # Level 3 deep review packet
<repo>_explain.txt # --explain: file/line/snippet proof trace
Report Preview
Sample PDF: Download the 5-page Level 3 report
View all 5 preview pages
| Page 1 | Page 2 |
|---|---|
| Page 3 | Page 4 |
|---|---|
| Page 5 |
|---|
Detection Methods
Every scored item maps to a concrete, inspectable detection method. No inference, no LLM judgment.
Full detection table
| Component | Detection Method |
|---|---|
| Stage 1 baseline | Non-zero README present (+60 base) |
| Stage 1 domain signal | Bio-domain keyword regex in README and package metadata |
| Stage 1 hype penalties (H1–H6) | Regex: clinical certainty, regulatory approval, autonomous replacement, breakthrough marketing, universal generalization, perfect accuracy claims |
| Stage 1 responsibility signals (R1–R5) | Regex: limitations section, regulatory framework, clinical disclaimer (CA-severity-weighted), demographic-bias disclosure, reproducibility provisions |
| Stage 2R consistency | Vocabulary set intersection across README/docs/package/tests; limitation repetition; clinical-boundary contradiction, version-staleness, and workflow-support deductions |
| Stage 3 T1 CI | .github/workflows/ contains at least one file |
| Stage 3 T2 domain tests | tests/ directory text contains bio-domain vocabulary (regex) |
| Stage 3 T3 changelog | CHANGELOG file presence + bug-fix/patch/security entry detection (3-tier: 0/+5/+15) |
| Stage 3 B1 data provenance | Dependency manifest presence + IRB/dataset-citation language detection (3-tier: 0/+10/+15) |
| Stage 3 B2 bias measurement | Bias/limitations vocabulary + quantitative measurement evidence (subgroup analysis, AUROC, demographic parity) (3-tier: 0/+8/+15) |
| Stage 3 B3 COI/funding | Funding, grant, sponsor, conflict-of-interest language in README/docs/FUNDING.md |
| Stage 4 containers | Dockerfile or compose file present |
| Stage 4 reproducibility target | Makefile with reproduce/eval/benchmark/test targets |
| Stage 4 dependency lock | Environment/lock/requirements file; exact pins or hash evidence |
| Stage 4 artifact references | Dataset/model/checkpoint URLs or checksum files |
| Stage 4 citation/interface | CITATION.cff; argparse CLI entry points (AST) |
| Stage 4 license restriction | Non-commercial, research-only, academic-only, no-clinical-use restrictions in LICENSE/README |
| CA severity | Clinical/diagnostic phrase regex in README, docs, and package metadata |
| C1 credentials | AWS AKIA*, OpenAI sk-*, GitHub ghp_*, api_key=... patterns; obvious placeholders excluded from penalty |
| C2 dependency pinning | == or hash pin vs. loose >=, ~=, <, > ranges |
| C3 deprecated paths | Patient-metadata patterns in deprecated/, legacy/, archive/ directories |
| C4 fail-open | except Exception: pass or except: pass in Python source (AST) |
AI Advisory Contract
The advisory system exports a sanitized, provider-neutral handoff packet and validates provider responses — without making any provider API call.
stem /path/to/repo --advisory validate # offline contract check
stem /path/to/repo --advisory packet # export sanitized input packet
stem /path/to/repo --advisory-response FILE # validate provider JSON response
Non-negotiable rules (enforced by the validator):
- Provider output cannot override
score.final_scoreorscore.formal_tier - Every advisory item must cite exact
finding_idstrings fromallowed_finding_ids - Raw repository source text is not included in provider packets
- Responses containing clinical safety, efficacy, regulatory, or medical-advice claims are rejected
allowed_finding_idsis capped at 40 entries per packet
Packet hardening added in v1.5.7:
provider_requestnow carries a secret-free request schema plus deterministic argument-validation statuscontract_schemasexports the advisory input/output contract shapes for downstream validatorspacket_contractconfirms allowlist parity, snippet omission, and non-negative omission counts before handoff
Secret boundary hardening added in v1.5.9:
- provider-specific environment variables are recognized before the generic advisory key fallback
- provider handoff metadata exports endpoint-policy validation and the expected env-var name, never the key value
- embedded-credential URLs are rejected; cloud providers require
https; plainhttpis limited to localhost .envfiles are ignored by default;.env.exampledocuments supported variable names only--advisory callis now the explicit provider-call boundary, with centralized redaction, logging-policy export, child-env allowlist reporting, and artifact pre-write sanitization
Full contract: docs/API_CONTRACT.md
Secret policy: docs/ADVISORY_SECRET_HANDLING.md
Runtime boundary: docs/ADVISORY_RUNTIME.md
MICA Memory Layer
The repository keeps a versioned MICA memory layer under memory/ for agent-session initialization,
drift control, and release provenance. Historical snapshots are retained as archive; the active layer
is selected by memory/mica.yaml.
Operational reference: docs/MICA_MEMORY.md
Web Demo
Live demo: huggingface.co/spaces/Flamehaven/stem-bio-ai
The Space runs the same deterministic local scanner on public GitHub repositories. No provider API call is made.
Run locally:
pip install -e .[demo]
python app.py
Repository Structure
STEM-BIO-AI/
stem_ai/ # Core Python package
docs/ # API contract, advisory runtime/secret policy, scoring rationale, MICA policy, report previews
memory/ # Versioned MICA archive/playbook/lessons; active layer selected by mica.yaml
audits/ # Reference benchmark artifacts
scripts/ # Benchmark and validation scripts
tests/ # Regression test suite
app.py # HuggingFace Spaces / Gradio entry point
pyproject.toml # Package metadata and extras
SKILL.md # Universal agent skill definition
CHANGELOG.md # Version history
Agent Skill Install
# Claude Code
git clone --depth 1 https://github.com/flamehaven01/STEM-BIO-AI.git ~/.claude/skills/stem-bio-ai
# Generic agent frameworks
git clone --depth 1 https://github.com/flamehaven01/STEM-BIO-AI.git ~/.agents/skills/stem-bio-ai
Contributing
See CONTRIBUTING.md. High-value areas: rubric discrimination examples, clinical-adjacency trigger refinements, additional bio-domain benchmark repositories, report rendering improvements.
Citation
@software{stem-bio-ai,
author = {Yun, Kwansub},
title = {STEM BIO-AI: Deterministic Evidence-Surface Scanner for Bio/Medical AI Repositories},
version = {1.6.0},
year = {2026},
url = {https://github.com/flamehaven01/STEM-BIO-AI}
}
License
Apache 2.0. See LICENSE.
Maintained by flamehaven01
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file stem_ai-1.6.0.tar.gz.
File metadata
- Download URL: stem_ai-1.6.0.tar.gz
- Upload date:
- Size: 152.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
83a1a3b1b43edc1e03083807c5c7847d4c24c70d12ee3a3733cab356b7fe7ce6
|
|
| MD5 |
90f60443d4f8c95df137e805abb6ac96
|
|
| BLAKE2b-256 |
7fd469239f75e498557b04665d216e9617cd38ee590d54225596254155f4504a
|
Provenance
The following attestation bundles were made for stem_ai-1.6.0.tar.gz:
Publisher:
publish.yml on flamehaven01/STEM-BIO-AI
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
stem_ai-1.6.0.tar.gz -
Subject digest:
83a1a3b1b43edc1e03083807c5c7847d4c24c70d12ee3a3733cab356b7fe7ce6 - Sigstore transparency entry: 1451620197
- Sigstore integration time:
-
Permalink:
flamehaven01/STEM-BIO-AI@bfe5056b4514064be7613af97f59803503e1119a -
Branch / Tag:
refs/tags/v1.6.0 - Owner: https://github.com/flamehaven01
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@bfe5056b4514064be7613af97f59803503e1119a -
Trigger Event:
release
-
Statement type:
File details
Details for the file stem_ai-1.6.0-py3-none-any.whl.
File metadata
- Download URL: stem_ai-1.6.0-py3-none-any.whl
- Upload date:
- Size: 171.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
39f35c12c94c3a5ee487111a81bfb3b64c3fc717c1da6b12d5cb447669a82e16
|
|
| MD5 |
774a195c6622b0903e8238976137d1e0
|
|
| BLAKE2b-256 |
9e9f4b0ee160af34565de4a9814d11bdbd4276f98d4cd2deef940ac37a9440fd
|
Provenance
The following attestation bundles were made for stem_ai-1.6.0-py3-none-any.whl:
Publisher:
publish.yml on flamehaven01/STEM-BIO-AI
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
stem_ai-1.6.0-py3-none-any.whl -
Subject digest:
39f35c12c94c3a5ee487111a81bfb3b64c3fc717c1da6b12d5cb447669a82e16 - Sigstore transparency entry: 1451620348
- Sigstore integration time:
-
Permalink:
flamehaven01/STEM-BIO-AI@bfe5056b4514064be7613af97f59803503e1119a -
Branch / Tag:
refs/tags/v1.6.0 - Owner: https://github.com/flamehaven01
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@bfe5056b4514064be7613af97f59803503e1119a -
Trigger Event:
release
-
Statement type: