SSI Ambiguous Site Detection Tool
Project description
ssiamb — Ambiguous Sites Counter
Author: Povilas Matusevicius pmat@ssi.dk
Repository: https://github.com/ssi-dk/ssiamb
License: MIT
Minimum Python: 3.12+
Overview
ssiamb computes an "ambiguous sites" metric for bacterial whole genome sequencing (WGS) as a measure of within-sample heterogeneity. This tool modernizes and standardizes the lab's prior definition while providing robust packaging, CLI interface, and Galaxy integration capabilities.
What are "Ambiguous Sites"?
An ambiguous site is a genomic position with:
- Sufficient coverage: Depth ≥
dp_min(default: 10) - Minor-allele signal: Minor-allele fraction (MAF) ≥
maf_min(default: 0.10)
These metrics are determined from variant calls after normalization and atomization, counting once per locus (multi-allelic sites count once if any ALT passes the thresholds).
Supported Modes
Self-mapping Mode (ssiamb self)
- Input: Reads → Sample's own assembly
- Use case: Analyze heterogeneity against the sample's assembled genome
- Mapping space: Uses the assembly as reference
Reference-mapped Mode (ssiamb ref)
- Input: Reads → Species canonical reference
- Use case: Compare against standardized reference genomes
- Reference selection: Via admin directory or user override
Key Features
- Flexible mapping: Support for minimap2 (default) and bwa-mem2
- Multiple variant callers: BBTools (default) and bcftools
- Comprehensive outputs: Summary TSV (always) + optional VCF, BED, matrices, per-contig analysis
- Depth analysis: Using mosdepth (default) or samtools
- Reusable workflows: Accept pre-computed BAM/VCF files
- Galaxy integration: Designed for workflow environments
- Quality control: Configurable thresholds with sensible defaults
Installation
This project is under active development. You can install the package for development (editable) from the repository so changes to the source are immediately available:
# Install in editable/development mode (recommended for contributors)
pip install -e .
# After editable install you can run the CLI via the console script or module:
ssiamb --help
# or
python -m ssiamb --help
When a stable release is published the package will also be available via PyPI and Bioconda (example future commands):
# Future installation via pip (PyPI)
pip install ssiamb
# Future installation via conda (Bioconda)
conda install -c bioconda ssiamb
Quick Start
# Check what would be done (dry run)
ssiamb --dry-run self --r1 sample_R1.fastq.gz --r2 sample_R2.fastq.gz --assembly sample.fna
# Self-mapping mode: analyze reads against sample's own assembly
ssiamb self --r1 sample_R1.fastq.gz --r2 sample_R2.fastq.gz --assembly sample.fna
# Reference-mapped mode: analyze against species reference
ssiamb ref --r1 sample_R1.fastq.gz --r2 sample_R2.fastq.gz --species "Escherichia coli"
# Summarize existing VCF and BAM files
ssiamb summarize --vcf sample.vcf.gz --bam sample.bam
# With custom thresholds and optional outputs
ssiamb self --r1 reads_R1.fastq.gz --r2 reads_R2.fastq.gz --assembly assembly.fna \
--dp-min 15 --maf-min 0.05 --emit-vcf --emit-bed
# Output to stdout (no files written)
ssiamb self --r1 reads_R1.fastq.gz --r2 reads_R2.fastq.gz --assembly assembly.fna --stdout
Error Codes
ssiamb follows a structured exit code system for programmatic handling:
- 0: Success
- 1: CLI/input errors (missing files, invalid sample names, bad arguments)
- 2: Reference mode selection errors (species not found, Bracken failures)
- 3: Reuse compatibility errors (VCF/BAM mismatch with reference)
- 4: External tool failures (missing tools, tool execution errors)
- 5: QC failures (only when
--qc-action failis enabled)
Errors include helpful suggestions and available options when applicable.
Output
Primary Output
ambiguous_summary.tsv: Single-row summary with ambiguous site counts and quality metrics
Optional Outputs (via flags)
--emit-vcf: Variant calls with ambiguity annotations--emit-bed: BED file of ambiguous sites--emit-matrix: Depth×MAF cumulative count matrix--emit-per-contig: Per-contig breakdown--emit-provenance: Analysis provenance and parameters--emit-multiqc: MultiQC-compatible reports
Dependencies
Required External Tools
- Mapping:
minimap2orbwa-mem2 - Variant calling:
BBTools(callvariants.sh) orbcftools - Depth analysis:
mosdepthorsamtools - VCF processing:
bcftools(for normalization)
Python Dependencies
- Python 3.12+
typer[all],rich,pyyaml,pandas,numpy,pysam,biopython
Install Python deps in a virtual environment (example):
python -m venv .venv
source .venv/bin/activate
pip install -e .
Running tests
The project includes a pytest test-suite. Run all tests with:
python -m pytest tests/ -v
Contributors should install in editable mode and run the tests before opening PRs.
Test dependencies:
- numpy
- pysam
- biopython
Development Status
This project is currently in active development. The implementation follows a structured approach:
- ✅ Planning & Specification - Comprehensive requirements defined
- 🚧 Repository Bootstrap - Setting up package structure
- ⏳ Core Implementation - CLI, models, and processing pipelines
- ⏳ External Tool Integration - Mapping and variant calling
- ⏳ Testing & Validation - Unit tests and integration testing
- ⏳ Packaging & Distribution - Bioconda, containers, Galaxy tools
Contributing
This project is developed by the SSI team. For questions or contributions, please contact:
- Povilas Matusevicius pmat@ssi.dk
Release Process
This project uses automated publishing to PyPI, Bioconda, and Galaxy ToolShed. The release process is as follows:
1. Version Update
-
Update version in
pyproject.toml:[project] version = "1.0.0" # Update this
-
Update version in
recipes/ssiamb/meta.yaml:{% set version = "1.0.0" %} # Update this
-
Update version in
galaxy/ssiamb.xml:<tool id="ssiamb" name="Ambiguous Sites Counter" version="1.0.0+galaxy0">
2. Create Release
-
Commit version changes:
git add pyproject.toml recipes/ssiamb/meta.yaml galaxy/ssiamb.xml git commit -m "Bump version to v1.0.0" git push origin main
-
Create and push tag:
git tag v1.0.0 git push origin v1.0.0
3. Automated Publishing
PyPI Publishing (Automatic)
- GitHub Actions automatically publishes to PyPI on tag push
- Uses PyPI Trusted Publishing (OIDC) - no tokens needed
- Creates signed GitHub release with artifacts
Bioconda Publishing (Manual)
-
Wait for PyPI release to complete
-
Update
recipes/ssiamb/meta.yamlwith correct SHA256:# Get SHA256 from PyPI release pip download ssiamb==1.0.0 --no-deps shasum -a 256 ssiamb-1.0.0.tar.gz
-
Copy
recipes/ssiamb/torecipes/ssiamb/in the fork -
Create pull request to bioconda-recipes
-
Address review feedback and wait for merge
Galaxy ToolShed Publishing (Manual)
-
Install planemo:
pip install planemo -
Test wrapper:
planemo test galaxy/ssiamb.xml(may fail until bioconda is available) -
Create account on Galaxy ToolShed
-
Upload wrapper:
cd galaxy/ planemo shed_upload --shed_target toolshed
4. Post-Release
-
Verify all distributions:
- PyPI: https://pypi.org/project/ssiamb/
- Bioconda: https://anaconda.org/bioconda/ssiamb
- Galaxy ToolShed: https://toolshed.g2.bx.psu.edu/
- BioContainers: https://quay.io/repository/biocontainers/ssiamb
-
Update documentation if needed
-
Announce release
Version Numbering
- Use semantic versioning:
MAJOR.MINOR.PATCH - Galaxy wrapper versions:
SOFTWARE_VERSION+galaxy0(increment galaxy# for wrapper-only changes) - Pre-releases:
1.0.0rc1,1.0.0a1, etc.
Troubleshooting
See PYPI_SETUP.md for PyPI Trusted Publishing configuration details.
Citation
Note: Citation information will be provided upon publication.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ssiamb-0.0.0.dev0.tar.gz.
File metadata
- Download URL: ssiamb-0.0.0.dev0.tar.gz
- Upload date:
- Size: 59.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37dff73c7bc878f49b71bf8233cbe56254ffcc81d2ae69f64a72ade1447f2f18
|
|
| MD5 |
1898a84b0dc2921394ca5467a4985cef
|
|
| BLAKE2b-256 |
32feac0c6a2453045a08b2f7738226b5b2750d37ff0b81c89d967e73187dca93
|
File details
Details for the file ssiamb-0.0.0.dev0-py3-none-any.whl.
File metadata
- Download URL: ssiamb-0.0.0.dev0-py3-none-any.whl
- Upload date:
- Size: 66.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bc47fc6b36d63cf078a57957d1c4e3fca852bbdbab8b9755fb93fb7e4fc06057
|
|
| MD5 |
b4214544110cfd85584729fd74733210
|
|
| BLAKE2b-256 |
50bf6ad68d8eca7f718e8fd318a37910b24052e82f06eb2a517dfe7ef3497a14
|