A rule-based validation engine for RNA-seq count matrices and sample metadata
Project description
BioFlowValidator
A transparent, rule-based validator for RNA-seq differential expression analysis workflows.
BioFlowValidator catches common scientific and computational errors in RNA-seq data before expensive analysis begins โ acting as a pre-analysis guard rail for wet-lab biologists, students, and clinical researchers.
Features
- โ 32 validation rules across 5 categories (format, sample, gene ID, normalization, biology)
- ๐ฌ Detects: sample mismatches, mixed gene ID namespaces, pre-normalized counts, too few replicates, library size outliers, and more
- ๐ Human-readable HTML report + machine-readable JSON
- ๐ REST API (FastAPI) + React/TypeScript frontend
- ๐ณ Single-command Docker startup
Quick Start
Docker (recommended)
git clone https://github.com/Rashidmstar12/BioFlowValidator.git
cd BioFlowValidator
docker compose up --build
Open http://localhost:3000 in your browser.
Local Development
Backend:
cd backend
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8000
Frontend:
cd frontend
npm install
npm run dev
Open http://localhost:5173.
Inputs
| File | Format | Required |
|---|---|---|
| Count matrix | TSV / CSV / XLSX (genes ร samples or samples ร genes) | โ |
| Sample metadata | TSV / CSV (sample IDs + condition column) | Optional |
Validation Rule Categories
| Category | Rules | Description |
|---|---|---|
| Format | FMT-001 โ FMT-008 | Encoding, delimiters, headers, duplicates, non-negatives, matrix orientation |
| Sample | SMP-001 โ SMP-005 | Sample ID matching, duplicates, replicates, near-identical replicate diagnostics |
| Gene ID | GEN-001 โ GEN-005 | Namespace consistency, duplicates, version suffixes, organism detection |
| Normalization | NRM-001 โ NRM-006 | Integer counts, library size ratios, zero genes, duplicate count profiles |
| Biology | BIO-001 โ BIO-008 | Single condition, MT fraction, label sanity, batch confounding, ERCC spike-ins |
See docs/validation_rules.md for the full rule reference.
Running Tests
cd backend
python -m pytest tests/ -v
Run the dataset benchmark:
python datasets/benchmark.py
API Reference
See docs/api_spec.md or browse the interactive docs at http://localhost:8000/docs.
Repository Structure
BioFlowValidator/
โโโ backend/ # Python FastAPI application
โ โโโ app/
โ โ โโโ engine/ # FileParser, RuleRegistry, RuleRunner
โ โ โโโ models/ # RuleResult, ValidationReport, ValidationContext
โ โ โโโ rules/ # format/, sample/, gene/, normalization/, biology/
โ โ โโโ report/ # JSONExporter, HTMLExporter
โ โ โโโ routers/ # FastAPI route handlers
โ โโโ tests/ # Unit + integration tests
โโโ frontend/ # React + TypeScript + Vite SPA
โโโ datasets/ # Valid + faulty example datasets + benchmark
โโโ docs/ # API spec, validation rules reference
โโโ Dockerfile.backend
โโโ Dockerfile.frontend
โโโ docker-compose.yml
Design Principles
- Validation only โ no analysis, no statistical computation
- Transparent โ every rule has a documented ID, description, and suggestion
- Auditable โ JSON report includes file SHA-256 hash and timestamp
- Scientifically conservative โ ambiguous cases produce WARNING not ERROR
- Reproducible โ same inputs always produce identical outputs
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bioflowvalidator-1.0.0.tar.gz.
File metadata
- Download URL: bioflowvalidator-1.0.0.tar.gz
- Upload date:
- Size: 36.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a996d0aa2e45ef7c51c2401748a60353451d9e373a010edec5347ca9bd9d5034
|
|
| MD5 |
ad4e7ee62ca9952e518a9d67dea01746
|
|
| BLAKE2b-256 |
de3e663b483630d1aafd1cc24f5d279bc914bee872104ab430f6d972981b3691
|
Provenance
The following attestation bundles were made for bioflowvalidator-1.0.0.tar.gz:
Publisher:
publish.yml on Rashidmstar12/BioFlowValidator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bioflowvalidator-1.0.0.tar.gz -
Subject digest:
a996d0aa2e45ef7c51c2401748a60353451d9e373a010edec5347ca9bd9d5034 - Sigstore transparency entry: 1634105368
- Sigstore integration time:
-
Permalink:
Rashidmstar12/BioFlowValidator@0b35818141127ff6315b18fd794b342e268ccb35 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/Rashidmstar12
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0b35818141127ff6315b18fd794b342e268ccb35 -
Trigger Event:
release
-
Statement type:
File details
Details for the file bioflowvalidator-1.0.0-py3-none-any.whl.
File metadata
- Download URL: bioflowvalidator-1.0.0-py3-none-any.whl
- Upload date:
- Size: 41.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75bd6a612205e756ebe912b6325262159446ef5cd67981872d3c4fec12da93a7
|
|
| MD5 |
efde47aa45a5fd7cb630d880a37539d9
|
|
| BLAKE2b-256 |
8e62f9a530b4002d3fd9422f96064158c17c31cbfd9387a356b97f2fc9bc4273
|
Provenance
The following attestation bundles were made for bioflowvalidator-1.0.0-py3-none-any.whl:
Publisher:
publish.yml on Rashidmstar12/BioFlowValidator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bioflowvalidator-1.0.0-py3-none-any.whl -
Subject digest:
75bd6a612205e756ebe912b6325262159446ef5cd67981872d3c4fec12da93a7 - Sigstore transparency entry: 1634105444
- Sigstore integration time:
-
Permalink:
Rashidmstar12/BioFlowValidator@0b35818141127ff6315b18fd794b342e268ccb35 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/Rashidmstar12
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0b35818141127ff6315b18fd794b342e268ccb35 -
Trigger Event:
release
-
Statement type: