Skip to main content

A collection of scripts that are useful for dealing with viral RNA NGS data.

Project description

Bioconda package Docker container Tests

The smallgenomeutilities are a collection of scripts that is useful for dealing and manipulating NGS data of small viral genomes. They are written in Python 3 with a small number of dependencies.

The smallgenomeutilities are part of the V-pipe workflow for analysing NGS data of short viral genomes.

Dependencies

You can install these python modules either using pip or bioconda:

  • biopython

  • bcbio-gff

  • numpy

  • pandas

  • progress

  • pysam

  • pysamstats

  • sklearn

  • matplotlib

  • progress

  • pyyaml

In addition to the modules, frameshift_deletions_checks currently requires mafft being installed – it is also available on bioconda.

Installation

The recommended way to install the smallgenomeutilities is using the bioconda package:

mamba install smallgenomeutilities

Another possibility is using pip:

# install from the current directory
pip install --editable .

# install from GitHub
pip install git+https://github.com/cbg-ethz/smallgenomeutilities.git

# install from Pypi
pip install smallgenomeutilities

Description of utilities

aln2basecnt

extract base counts and coverage information from a single alignment file

compute_mds

Compute multidimensional scaling for visualizing distances among reconstructed haplotypes.

convert_qr

Convert QuasiRecomb output of a transmitter and recipient set of haplotypes to a combined set of haplotypes, where gaps have been filtered. Optionally translate to peptide sequence.

convert_reference

Perform a genomic liftover. Transform an alignment in SAM or BAM format from one reference sequence to another. Can replace M states by =/X.

coverage

Calculate average coverage for a target region on a different contig.

coverage_stats

Calculate average coverage for a target region of an alignment.

extract_consensus

Build consensus sequences including either the majority base or the ambiguous bases from an alignment (BAM) file.

extract_coverage_intervals

Extract regions with sufficient coverage for running ShoRAH. Half-open intervals are returned, [start:end), and 0-based indexing is used.

extract_sam

Extract subsequences of an alignment, with the option of converting it to peptide sequences. Can filter on the basis of subsequence frequency or gap frequencies in subsequences.

extract_seq

Extract sequences of alignments into a FASTA file where the sequence id matches a given string.

frameshift_deletions_checks

European Galaxy server

Produce a report about frameshifting indels in a consensus sequences

gather_coverage

gather multiple per sample coverage information into a single unified file

mapper

Determine the genomic offsets on a target contig, given an initial contig and offsets. Can be used to map between reference genomes.

min_coverage

find the minimum coverage in a region from an alignment

minority_freq

Extract frequencies of minority variants from multiple samples. A region of interest is also supported.

pair_sequences

Compare sequences from a multiple sequence alignment from transmitter and recipient samples in order to determine the optimal matching of transmitters to recipients.

predict_num_reads

Predict number of reads after quality preprocessing.

prepare_primers

Starting with a primers BED file, generate the other files used by V-pipe (inserts BED file, and TSV and FASTA file of primers sequences)

remove_gaps_msa

Given a multiple sequence alignment, remove loci with a gap fraction above a certain threshold.

Contributions

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smallgenomeutilities-0.4.1.tar.gz (69.3 MB view details)

Uploaded Source

Built Distribution

smallgenomeutilities-0.4.1-py3-none-any.whl (78.6 kB view details)

Uploaded Python 3

File details

Details for the file smallgenomeutilities-0.4.1.tar.gz.

File metadata

  • Download URL: smallgenomeutilities-0.4.1.tar.gz
  • Upload date:
  • Size: 69.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for smallgenomeutilities-0.4.1.tar.gz
Algorithm Hash digest
SHA256 0fbf5077b8147bbcd445ee9e761f1234e947a093d3cc908bb1c3dfc76c1ce1e1
MD5 76428073928c0214630b7f7d42b6cb0e
BLAKE2b-256 63409e88ea1f2ff942b1c22d3ac91a99313b6d6689ec4620da43e7a5d4bbaa0e

See more details on using hashes here.

File details

Details for the file smallgenomeutilities-0.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for smallgenomeutilities-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 975f158fc6107ff63371cfd54ea6e30f9dea295ecb1eb1dcf932c66076863b4d
MD5 02122bc5a7a6581de20e0496a9533190
BLAKE2b-256 e5a1266b7a8f0c7518746f53d6ee50cd04548ef176dc093198cacac0351a8131

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page