Skip to main content

Experiment Readiness Checker

Project description

Experiment Readiness Checker (expready)

expready is a CLI tool that checks whether study inputs are analysis-ready before downstream workflows. It focuses on metadata quality, design quality, and sample-ID consistency across files.

Setup

pip install expready

Commands

  • validate: check your inputs and create an HTML report.
  • fix: clean common formatting issues in metadata/manifest files.

Input file requirements

Supported formats for all inputs: .csv, .tsv, .txt. Delimiter handling is flexible: expready auto-detects common delimiters (comma, tab, semicolon, pipe, or whitespace-separated columns). All input files must include a header row (column names in the first row). Headerless files are not supported.

metadata file (--metadata):

  • One row per sample.
  • Required columns:
    • Metadata sample-ID column (default sample_id, or the column passed via --sample-id / --metadata-id)
    • Condition column (default condition; if default condition is not found, treatment is tried; or use --condition)
  • If you pass --batch, --pair, or --covars, those columns must exist in metadata.
  • Metadata sample-ID values should be unique and non-empty.

matrix file (--matrix):

  • Feature-by-sample table (rows = features, sample IDs in columns).
  • Sample column names should match metadata sample-ID values exactly when both files are used.
  • If metadata is not provided, expready infers metadata from matrix sample columns.
  • Caveat: matrix-only runs are supported, but inferred metadata can weaken design checks; use a real metadata file for more reliable design validation.
  • Common annotation headers such as gene_id, feature_id, or #OTU ID are supported as non-sample columns.

manifest file (--manifest):

  • Sample inventory table (for example, sample ID + file path columns).
  • Must contain the sample-ID column specified by --sample-id / --manifest-id (default sample_id).
  • Sample IDs in this column should match metadata sample-ID values exactly.

Minimal examples:

Metadata (metadata.csv)

sample_id,condition,batch
S1,Control,B1
S2,Control,B1
S3,Treated,B2
S4,Treated,B2

Matrix (matrix.tsv)

gene_id	S1	S2	S3	S4
GeneA	10	12	4	6
GeneB	0	1	8	9

Manifest (manifest.tsv)

sample_id	file_path
S1	/data/S1.fastq.gz
S2	/data/S2.fastq.gz
S3	/data/S3.fastq.gz
S4	/data/S4.fastq.gz

Common Study Templates

Use these as starting points for common experiment types.

Bulk RNA-seq (metadata + count matrix):

expready validate --metadata metadata.csv --matrix counts.tsv --condition condition --output reports/rnaseq

Microbiome / 16S (feature table + metadata):

expready validate --metadata metadata.tsv --matrix feature_table.tsv --condition group --output reports/microbiome

Metabolomics (intensity matrix with batch effect checks):

expready validate --metadata metadata.csv --matrix intensities.csv --condition condition --batch batch --output reports/metabolomics

Paired or blocked design (for paired samples/subjects):

expready validate --metadata metadata.csv --matrix counts.tsv --condition condition --pair pair_id --output reports/paired

Manifest consistency check (sample inventory + paths):

expready validate --metadata metadata.csv --manifest manifest.tsv --output reports/manifest_check

Test runs

Pass example (metadata + matrix):

expready validate --metadata examples/metadata_valid.csv --matrix examples/matrix_valid.tsv --output reports/test_pass --report pass_report

Expected:

  • Console shows Status: PASS
  • Writes reports/test_pass/pass_report.html

Fail example (matrix only):

expready validate --matrix examples/matrix_valid.tsv --output reports/test_fail --report fail_report

Expected:

  • Console may show Status: FAIL (this is expected for this demo path)
  • Writes reports/test_fail/fail_report.html
  • Writes reports/test_fail/metadata.inferred.csv

Input options

  • validate: requires at least one of --metadata or --matrix
  • fix: requires at least one of --metadata, --matrix, or --manifest
  • Supported tabular file formats: .csv, .tsv, .txt (delimiter is auto-detected)

Optional arguments by file type:

Metadata file options

  • --metadata FILE: metadata table path.
  • --sample-id COLUMN: shared sample-ID column name for metadata and manifest (default: sample_id).
  • --metadata-id COLUMN: metadata sample-ID column name (overrides --sample-id for metadata).
  • --condition COLUMN: main grouping column (default: condition, matched case-insensitively; if default condition is not found, treatment is tried).
  • --batch COLUMN: optional batch column.
  • --pair COLUMN: optional pair/block column.
  • --covars COLS...: optional covariate columns (space-separated names).
  • --contrast A_vs_B: optional contrast in GroupA_vs_GroupB format.

Matrix file options

  • --matrix FILE: feature-by-sample matrix path.
  • Sample IDs are read from matrix sample columns.
  • If --metadata is omitted, metadata is inferred from matrix sample columns.

Manifest file options

  • --manifest FILE: manifest table path.
  • --manifest-id COLUMN: manifest sample-ID column name (overrides --sample-id for manifest).
  • --manifest-path COLUMN: manifest column containing file paths.
  • --check-paths: check whether manifest paths exist on disk (uses --manifest-path or common names like file_path).
  • Used for cross-file sample-ID consistency checks against metadata.

Other command options:

  • --report NAME: output report filename for validate (.html is added if omitted).
  • --format FMT: fixed-table output format for fix (tsv or csv, default: tsv).

Outputs

validate

Writes:

  • report.html (or your --report filename)
  • metadata.inferred.csv (only when --metadata is omitted)

Behavior:

  • Provide at least one of --metadata or --matrix.
  • Input-contract errors fail fast and do not write a report (for example: missing required columns requested by CLI options, inconsistent delimiters, or invalid manifest sample/path column settings).
  • If you provide only --matrix, expready builds metadata from matrix sample columns and saves it as metadata.inferred.csv.
  • Matrix-only validation is useful for quick checks, but inferred metadata depends on sample-name patterns and may reduce design-check accuracy.
  • Sample-ID and condition column-name matching is case-insensitive, and treats _, -, and spaces as equivalent for matching.
  • If you provide --manifest, expready compares metadata sample-ID values (from --sample-id or --metadata-id) to the manifest column set by --sample-id or --manifest-id.
  • --sample-id sets a shared default; --metadata-id and --manifest-id override it per file.
  • If the manifest sample column is missing, validation exits with an input error and does not generate a report.
  • validate expects sample IDs to match exactly across files.

Examples:

# matrix-only validation (metadata inferred automatically)
expready validate --matrix counts.tsv --output reports/validate_matrix_only

# metadata + matrix validation
expready validate --metadata metadata.csv --matrix counts.tsv --output reports/validate_meta_matrix

# metadata + manifest validation (manifest column is named "rownames")
expready validate --metadata metadata.csv --manifest manifest.tsv --manifest-id rownames --output reports/validate_meta_manifest

fix

Writes:

  • metadata.fixed.<fmt> (if --metadata is provided, or inferred from --matrix)
  • manifest.fixed.<fmt> (if manifest is provided)
  • fix.log

Behavior:

  • --metadata is optional.
  • With --matrix and no --metadata, metadata is inferred and saved to metadata.fixed.<fmt>.
  • With only --manifest, no metadata.fixed.<fmt> is written.
  • --format controls fixed table format and extension (tsv -> .tsv, csv -> .csv).
  • fix does not map different sample-ID schemes. It only does safe cleanup (trim spaces, standardize empty-like values, remove fully empty rows).

Examples:

# metadata fixed + manifest fixed + fix.log
expready fix --metadata metadata.csv --manifest manifest.tsv --output reports/fix_meta_manifest

# metadata inferred from matrix, then fixed + fix.log
expready fix --matrix counts.tsv --output reports/fix_from_matrix

# only manifest fixed + fix.log
expready fix --manifest manifest.tsv --output reports/fix_manifest_only

# optional: write fixed tables as CSV instead of the default TSV
expready fix --metadata metadata.csv --manifest manifest.tsv --output reports/fix_csv --format csv

Understanding outputs and issues

What each output file means:

  • report.html: main validation report with status, issue list, and suggested fixes.
  • metadata.inferred.csv: metadata generated from matrix sample columns (only when metadata input is omitted).
  • metadata.fixed.<fmt>: cleaned metadata written by fix (fmt is tsv by default, or csv if selected).
  • manifest.fixed.<fmt>: cleaned manifest written by fix (fmt is tsv by default, or csv if selected).
  • fix.log: summary of what fix changed (empty-like values standardized, fully empty rows removed, headers normalized/skipped).

How to read validation status:

  • PASS: no blocking issues were found.
  • FAIL: at least one blocking issue was found.

How to prioritize issues in report.html:

  • Extreme: blocking issue; fix these first.
  • Moderate: non-blocking but important quality risk.
  • None: informational check passed/no action required.

Issue sections:

  • Metadata: schema and sample-ID quality checks.
  • Design: group structure and model-readiness checks.
  • Cross-file: sample-ID consistency across metadata, matrix, and manifest.

Report wording guide

Common report language and what it means:

  • Blocking issue: an issue severe enough to set overall status to FAIL.
  • Some metadata sample IDs are missing in the matrix: sample IDs exist in metadata but are not found in matrix sample columns.
  • Some matrix sample IDs are not listed in metadata: sample IDs exist in matrix columns but not in metadata.
  • Some metadata sample IDs are missing in the manifest: sample IDs exist in metadata but are not found in the manifest sample-ID column.
  • Manifest sample-ID column was not found: the column passed via --manifest-id does not exist in manifest.
  • Manifest path column was not found: the column passed via --manifest-path does not exist in manifest.
  • Header names contain spaces: non-blocking warning; run fix to normalize headers with underscores.
  • Input file appears to have inconsistent delimiters: rows do not have a consistent column structure (often caused by mixed tabs/spaces/commas). In CLI validate, this is treated as an input error and the run exits before report generation.
  • Duplicate sample IDs: the same metadata sample-ID value appears in more than one metadata row.
  • Required metadata fields are empty: required columns (like metadata sample ID or condition) contain missing values.
  • Some condition groups have too few replicates: at least one condition group has fewer than 2 samples.
  • Condition and batch are fully linked: condition and batch are one-to-one, so their effects cannot be separated.
  • A category value appears only once: a value in condition, batch, or covariates appears for only one sample.
  • Model setup is too complex for the sample count: estimated model terms are too many for available samples.

Help

expready --help
expready validate --help
expready fix --help

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

expready-0.1.1.tar.gz (35.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

expready-0.1.1-py3-none-any.whl (36.9 kB view details)

Uploaded Python 3

File details

Details for the file expready-0.1.1.tar.gz.

File metadata

  • Download URL: expready-0.1.1.tar.gz
  • Upload date:
  • Size: 35.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for expready-0.1.1.tar.gz
Algorithm Hash digest
SHA256 81cd021b70536d23ab68e37c39f247d4ae6e071228f08a9cd4063c1fc22c9d53
MD5 78c5991a692c69188074905fdafb1d6b
BLAKE2b-256 92aec6b199df669e0ef7abd5d1cea2e7fad7f791bde0f10f8b645ea823b5d4db

See more details on using hashes here.

File details

Details for the file expready-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: expready-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 36.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for expready-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9b986927b7c278f153e6925769adcdc673edd9e9613aacd676f4eae8b76b41fe
MD5 a79d5335e6067402b7e73f0fd6c2c5bc
BLAKE2b-256 c986389d8e8297f66cf8326efe729f690029f4e30ce1cab4b09b406ee80d2000

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page