A bioinformatics software focused on quality control based on species criteria
Project description
speccheck
speccheck is a modular command-line tool for collecting, validating, and summarizing quality control (QC) metrics from genomic analysis pipelines. It automatically detects and processes outputs from multiple bioinformatics tools, validates them against customizable criteria, and generates comprehensive reports with optional interactive visualizations.
Features
- ๐ Automatic Module Detection: Supports CheckM, QUAST, Speciator, ARIBA, and Sylph outputs
- โ Flexible QC Validation: Define organism-specific quality criteria with pass/fail checks
- ๐ Interactive Reports: Generate HTML dashboards with Plotly visualizations
- ๐ Metadata Integration: Merge external sample metadata into QC reports
- ๐ Rich Logging: Beautiful console output with Rich library
- ๐ณ Docker Support: Pre-built Docker images available
Installation
From Source
Clone the repository and install with pip:
git clone https://github.com/happykhan/speccheck.git
cd speccheck
pip install -e .
Development Installation
For development with testing and linting tools:
pip install -e '.[dev]'
Note: This project uses modern Python packaging with pyproject.toml (PEP 517/621). See MIGRATION.md for details on the migration from setup.py.
Docker
A Docker image is available for containerized execution:
docker pull happykhan/speccheck
Quick Start
- Collect QC data from analysis outputs:
speccheck collect tests/practice_data/Sample_* --output-file results.csv
- Generate summary report with visualizations:
speccheck summary qc_results/ --plot
- Validate criteria file:
speccheck check --criteria-file criteria.csv
Usage
Command: collect
Collect and validate QC metrics from bioinformatics tool outputs.
speccheck collect [OPTIONS] FILEPATHS...
Options
| Option | Type | Default | Description |
|---|---|---|---|
FILEPATHS |
Positional | Required | File paths (supports wildcards like data/*/*.tsv) |
--organism |
String | Auto-detect | Organism name for criteria matching |
--sample |
String | None | Sample identifier |
--criteria-file |
Path | criteria.csv |
CSV file with QC criteria |
--output-file |
Path | qc_results/collected_data.csv |
Output CSV path |
--metadata |
Path | None | CSV with additional metadata (requires sample_id column) |
-v, --verbose |
Flag | False | Enable debug logging |
--version |
Flag | - | Show version and exit |
Examples
Basic collection:
speccheck collect data/sample1/*.tsv --sample sample1
With organism specification:
speccheck collect data/ecoli_* --organism "Escherichia coli" --output-file ecoli_qc.csv
With metadata merging:
speccheck collect data/* --metadata sample_info.csv --output-file merged_results.csv
Supported Modules
The collect command automatically detects outputs from:
- CheckM: Completeness, contamination, genome metrics
- QUAST: Assembly statistics (N50, contigs, GC content)
- Speciator: Species identification and confidence
- ARIBA: Antimicrobial resistance gene detection
- Sylph: Metagenomic profiling and ANI values
Command: summary
Generate consolidated reports from multiple collected QC files.
speccheck summary [OPTIONS] DIRECTORY
Options
| Option | Type | Default | Description |
|---|---|---|---|
DIRECTORY |
Positional | Required | Directory containing CSV QC reports |
--output |
Path | qc_report |
Output directory for summary |
--species |
String | Speciator.speciesName |
Column name for species field |
--sample |
String | sample_id |
Column name for sample identifier |
--templates |
Path | templates/report.html |
HTML template file |
--plot |
Flag | False | Generate interactive plots |
-v, --verbose |
Flag | False | Enable debug logging |
--version |
Flag | - | Show version and exit |
Examples
Basic summary:
speccheck summary qc_results/
With plotting enabled:
speccheck summary qc_results/ --plot --output final_report/
Custom field names:
speccheck summary results/ --sample SampleID --species Species --plot
Output
report.csv: Consolidated QC metrics with sorted columns (sample_id, all_checks_passed, .check columns, other fields)report.html: Interactive HTML dashboard (when--plotis enabled)
Command: check
Validate the structure and content of a criteria file.
speccheck check [OPTIONS]
Options
| Option | Type | Default | Description |
|---|---|---|---|
--criteria-file |
Path | criteria.csv |
Path to criteria CSV file |
-v, --verbose |
Flag | False | Enable debug logging |
--version |
Flag | - | Show version and exit |
Example
speccheck check --criteria-file config/custom_criteria.csv
Criteria File Format
The criteria file defines organism-specific QC thresholds in CSV format:
organism,software,field,operator,threshold
Escherichia coli,Checkm,Completeness,>=,95
Escherichia coli,Checkm,Contamination,<=,5
Escherichia coli,Quast,N50,>=,50000
Columns:
organism: Species or genus name (use "all" for universal criteria)software: Tool name (CheckM, QUAST, Speciator, ARIBA, Sylph)field: Metric name from tool outputoperator: Comparison operator (>=,<=,==,>,<)threshold: Numeric threshold value
Metadata Integration
Add external sample metadata using the --metadata option:
metadata.csv:
sample_id,location,sequencing_date,batch
sample1,Lab A,2024-01-15,Batch1
sample2,Lab B,2024-01-16,Batch1
speccheck collect data/* --metadata metadata.csv --output-file results.csv
Metadata columns are automatically merged with QC metrics based on sample_id.
Output Format
CSV Column Order
Output files are automatically organized for readability:
- Sample identifier (
sample_idorSample) - Overall checks (columns ending with
all_checks_passed) - Individual checks (columns ending with
.check) - sorted alphabetically - Metrics (remaining columns) - sorted alphabetically
Example Output
sample_id,all_checks_passed,Checkm.all_checks_passed,Checkm.Completeness.check,Checkm.Contamination.check,Checkm.Completeness,Checkm.Contamination
sample1,True,True,True,True,98.5,1.2
sample2,False,False,False,True,89.3,0.8
Development
Running Tests
pytest
pytest --cov=speccheck # With coverage
Code Quality
pylint speccheck/
Project Structure
speccheck/
โโโ speccheck/
โ โโโ __init__.py
โ โโโ main.py # Core logic
โ โโโ collect.py # File collection & writing
โ โโโ criteria.py # Criteria validation
โ โโโ report.py # Report generation
โ โโโ modules/ # Tool-specific parsers
โ โ โโโ checkm.py
โ โ โโโ quast.py
โ โ โโโ speciator.py
โ โ โโโ ariba.py
โ โ โโโ sylph.py
โ โโโ plot_modules/ # Visualization modules
โ โโโ plot_checkm.py
โ โโโ plot_quast.py
โ โโโ ...
โโโ tests/ # Pytest test suite
โโโ templates/ # HTML templates
โโโ speccheck.py # CLI entry point
โโโ setup.py # Package configuration
Dependencies
- Core:
rich,typer,pandas,jinja2,plotly - Dev:
pytest,pytest-cov,pylint,coverage
Version
Check the installed version:
speccheck --version
License
This project is licensed under the GNU General Public License v3.0 (GPLv3). See LICENSE for details.
Contributing
Contributions are welcome! We appreciate bug reports, feature requests, documentation improvements, and code contributions.
Quick Start for Contributors
- Fork the repository
- Install development dependencies:
pip install -e '.[dev]' - Install pre-commit hooks:
pre-commit install - Create a feature branch:
git checkout -b feature/your-feature - Make your changes and add tests
- Run checks:
pytest --cov=speccheck && ruff check speccheck/ - Submit a pull request
For detailed guidelines, see CONTRIBUTING.md.
Code Quality
This project uses:
- Black for code formatting
- Ruff for fast linting
- Pylint for comprehensive code analysis
- pytest with coverage reporting
- pre-commit hooks for automated checks
All PRs must pass CI checks including tests on Python 3.10, 3.11, and 3.12 across Ubuntu, macOS, and Windows.
Citation
If you use speccheck in your research, please cite:
[Citation information to be added]
Support
- Issues: GitHub Issues
- Documentation: This README
- Contact: See setup.py for author information
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file speccheck_qc-1.2.0.tar.gz.
File metadata
- Download URL: speccheck_qc-1.2.0.tar.gz
- Upload date:
- Size: 124.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15a57ca0d5d5ab40ac0c992727c8dabb73b7af0832aae4557cf08eb5d87e028b
|
|
| MD5 |
66f2acd8ce6c29eb9cdeabd50630f41e
|
|
| BLAKE2b-256 |
9adf7b878c860c421097ecef18a3b4e3c10fdeec3819e1fb3929342ad1b6c54b
|
File details
Details for the file speccheck_qc-1.2.0-py3-none-any.whl.
File metadata
- Download URL: speccheck_qc-1.2.0-py3-none-any.whl
- Upload date:
- Size: 49.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d3556a07d73a46d6e1941f6fa8a662a0cccd0a0bd4a079152823d7c02bba5b20
|
|
| MD5 |
fcfbf6cc631dad942bfd3ff13e3a8767
|
|
| BLAKE2b-256 |
50a85b8ce1404134c829066c002a634e894d18bed0b0ec27c1586b3d3cf6ceca
|