A package for calling HGVS variants in single SAM/BAM alignments
Project description
pebbles
Pebbles is a package for calling variants from single reads into HGVS format.
In many use cases we are interested in calling variants from a single read rather than the more typical variant calling case where evidence from multiple reads is accumulated by genome position. Examples of such uses include saturation genome editing and other multiplex assays of variant effect.
Pebbles is a lightweight converter of BAM or SAM files to genomic HGVS. This means that the resulting calls will be in the format g.REFNAME:REFPOSITION and will not be corrected to be the most 3' variant (due to the way most aligners work they will be 5' shifted on the + reference strand). For most uses you will want to process the output of pebbles with a HGVS validator and usually project them into a different coordinate space such as the c. coordinates of the gene/transcript of interest. This can be done with packages such as https://hgvs.readthedocs.io or websites like mutalyzer.nl
Currently pebbles ignores per base quality scores and assumes all reads are a single unpaired unique observation. In most uses it will be desirable to align overlaps for paired end sequencing and trim reads before calling.
Installation
To install from PyPI using pip:
pip install pebbles
To install from github using pip:
pip install git+https://github.com/genomematt/pebbles
Pebbles requires pysam and has only been tested with versions >= 0.20.0 (htslib 1.6) Hatch is used as the build system, and will be required for source installs (and you need an up to date pip).
Usage
For input pebbles requires a SAM or BAM file of alignment segments with MD tags. When using minimap2
you
will need to map with the --MD
argument.
Pebbles is in early and active development. Features and usage is likely to change as it is integrated into other tools.
Pebbles can be used to call per read or to count occurrences of variants
To call per read
pebbles call myalignedsequences.bam > output.tsv
This will produce a tab seperated file with the read name, and a list of variants identified in the read as a python list. For the test data of reads named with the expected variant in the tammar wallaby opsin gene this output looks like:
readname call
WT None
16_18delGAC ['AY286018:g.16_18delGAC']
18_19insATG ['AY286018:g.18_19insATG']
19_20delinsAG ['AY286018:g.19_20delinsAG']
19_20delinsAG ['AY286018:g.19_20delinsAG']
19_21delinsATG ['AY286018:g.19_21delinsATG']
59A>T ['AY286018:g.59A>T']
59A>T ['AY286018:g.59A>T']
To generate counts
pebbles count myalignedsequences.bam > output.tsv
This will produce a tab seperated file with a column of variants and a column of counts. For the test data this output looks like:
variant count
AY286018:g.16_18delGAC 1
AY286018:g.18_19insATG 1
AY286018:g.19_20delinsAG 2
AY286018:g.19_21delinsATG 1
AY286018:g.59A>T 2
For more detailed usage information see
pebbles --help
Usage as a CountESS plugin
The CountESS project is a graphical workflow manager for analysing count based datasets, in particular Deep Mutational Scanning (DMS) and other Multiplex Assays of Variant Effects. CountESS is built with an entrypoint and inheritance based plugin system, that pebbles implements.
To use pebbles in a CountESS workflow both Pebbles and CountESS need to be installed in the same python environment. Once correctly installed the BAM and SAM parsing workflow steps should automatically be detected by CountESS and made available in the user interface.
For further information on CountESS see https://github.com/CountESS-Project/CountESS
Contributing to Pebbles
Pebbles is licensed under the BSD-3-Clause license.
You are free to fork this repository under the terms of that license.
If you have suggested changes please start by raising an issue in the issue tracker.
Pull requests are welcome and will be included at the discretion of the author.
Pull requests should be based on the 'develop' branch
(with the exception of bugfixes where develop has diverged from main).
Bug reports should be made to the issue tracker.
Difficulty in understanding how to use the software is a documentation bug, and should also be raised on the issue tracker so your question and my response are easily found by others.
Pebbles aims to maintain a respectful and inclusive community and adopts the contributor covenant v2.1
Citing Pebbles
Pebbles is currently unpublished. The current release can be cited using the Zenodo DOI.
License
Pebbles is released under the BSD 3 Clause License https://opensource.org/license/bsd-3-clause/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pebbles-0.3.0.tar.gz
.
File metadata
- Download URL: pebbles-0.3.0.tar.gz
- Upload date:
- Size: 14.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c83e4fd7d004f3a3b01e534afecd92f25fa7ce147c46b11db4a58810a930188d |
|
MD5 | 137a3c267e1b95a49e31cb4215c19a7a |
|
BLAKE2b-256 | 194260ffa59a34ecafe40c438141a43b3bfc4232c6ac854116567859d97dfa33 |
File details
Details for the file pebbles-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: pebbles-0.3.0-py3-none-any.whl
- Upload date:
- Size: 13.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a805a5a23c85739899c64d58ccb5733e1f356b8be3a18ef60669102d99a50bb1 |
|
MD5 | 72bd5db22f7f1197afd8ce8110dcc5c4 |
|
BLAKE2b-256 | 8835ae4c467132944a413847d25d44efa9e1f4f39f2404b6aef8159064e0d5e1 |