A collection of scripts that are useful for dealing with viral RNA NGS data.
Project description
The smallgenomeutilities are a collection of scripts that is useful for dealing and manipulating NGS data of small viral genomes. They are written in Python 3 with a small number of dependencies.
The smallgenomeutilities are part of the V-pipe workflow for analysing NGS data of short viral genomes.
Dependencies
You can install these python modules either using pip or bioconda:
biopython
bcbio-gff
numpy
pandas
progress
pysam
pysamstats
sklearn
matplotlib
progress
pyyaml
In addition to the modules, frameshift_deletions_checks currently requires mafft being installed – it is also available on bioconda.
Installation
The recommended way to install the smallgenomeutilities is using the bioconda package:
mamba install smallgenomeutilities
Another possibility is using pip:
# install from the current directory
pip install --editable .
# install from GitHub
pip install git+https://github.com/cbg-ethz/smallgenomeutilities.git
# install from Pypi
pip install smallgenomeutilities
Description of utilities
aln2basecnt
extract base counts and coverage information from a single alignment file
compute_mds
Compute multidimensional scaling for visualizing distances among reconstructed haplotypes.
convert_qr
Convert QuasiRecomb output of a transmitter and recipient set of haplotypes to a combined set of haplotypes, where gaps have been filtered. Optionally translate to peptide sequence.
convert_reference
Perform a genomic liftover. Transform an alignment in SAM or BAM format from one reference sequence to another. Can replace M states by =/X.
coverage
Calculate average coverage for a target region on a different contig.
coverage_stats
Calculate average coverage for a target region of an alignment.
extract_consensus
Build consensus sequences including either the majority base or the ambiguous bases from an alignment (BAM) file.
extract_coverage_intervals
Extract regions with sufficient coverage for running ShoRAH. Half-open intervals are returned, [start:end), and 0-based indexing is used.
extract_sam
Extract subsequences of an alignment, with the option of converting it to peptide sequences. Can filter on the basis of subsequence frequency or gap frequencies in subsequences.
extract_seq
Extract sequences of alignments into a FASTA file where the sequence id matches a given string.
frameshift_deletions_checks
Produce a report about frameshifting indels in a consensus sequences
gather_coverage
gather multiple per sample coverage information into a single unified file
mapper
Determine the genomic offsets on a target contig, given an initial contig and offsets. Can be used to map between reference genomes.
min_coverage
find the minimum coverage in a region from an alignment
minority_freq
Extract frequencies of minority variants from multiple samples. A region of interest is also supported.
pair_sequences
Compare sequences from a multiple sequence alignment from transmitter and recipient samples in order to determine the optimal matching of transmitters to recipients.
predict_num_reads
Predict number of reads after quality preprocessing.
prepare_primers
Starting with a primers BED file, generate the other files used by V-pipe (inserts BED file, and TSV and FASTA file of primers sequences)
remove_gaps_msa
Given a multiple sequence alignment, remove loci with a gap fraction above a certain threshold.
Contributions
David Seifert <david.seifert@bsse.ethz.ch>
Susana Posada Cespedes <susana.posada@bsse.ethz.ch>
Ivan Blagoev Topolsky <ivan.topolsky@sib.swiss>
Lara Fuhrmann <lara.fuhrmann@bsse.ethz.ch>
Mateo Carrara <carrara@nexus.ethz.ch>
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file smallgenomeutilities-0.4.1.tar.gz
.
File metadata
- Download URL: smallgenomeutilities-0.4.1.tar.gz
- Upload date:
- Size: 69.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0fbf5077b8147bbcd445ee9e761f1234e947a093d3cc908bb1c3dfc76c1ce1e1 |
|
MD5 | 76428073928c0214630b7f7d42b6cb0e |
|
BLAKE2b-256 | 63409e88ea1f2ff942b1c22d3ac91a99313b6d6689ec4620da43e7a5d4bbaa0e |
File details
Details for the file smallgenomeutilities-0.4.1-py3-none-any.whl
.
File metadata
- Download URL: smallgenomeutilities-0.4.1-py3-none-any.whl
- Upload date:
- Size: 78.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 975f158fc6107ff63371cfd54ea6e30f9dea295ecb1eb1dcf932c66076863b4d |
|
MD5 | 02122bc5a7a6581de20e0496a9533190 |
|
BLAKE2b-256 | e5a1266b7a8f0c7518746f53d6ee50cd04548ef176dc093198cacac0351a8131 |