Skip to main content

Bacterial species unassigner

Project description

Unassigner

Tests Super-Linter codecov PyPi

Evaluate consistency with named bacterial species for short 16S rRNA marker gene sequences.

Summary

The 16S rRNA gene is found in all bacteria, and its gene sequence is highly conserved. Amplification and sequencing of bacterial 16S rRNA genes is a common method used to survey bacterial communities in microbiome research. However, high throughput instruments are unable to sequence the entire gene. Therefore, a short region of the gene is selected for amplification and sequencing.

The resultant sequences, spanning part of the 16S gene, can be used to identify the types of bacteria present in a specimen. For example, one sequence might be assigned to the Streptococcus genus based on sequence similarity. Many programs are available to carry out such taxonomic assignment.

It is generally thought that the 16S rRNA gene is not suitable for assignment of bacterial species. We agree, but with a catch: the gene sequence is suitable for ruling out assignment to many bacterial species. This software is designed to rule out all the species designations that are inconsistent with a partial 16S rRNA gene sequence. For those species that are not definitively ruled out, we assign a probability that the sequence is inconsistent with the species.

Because the software is geared towards ruling out species rather than deciding on the best assignment, we call it the unassigner. It's a cheesy joke, but we've decided to roll with it.

The unassigner library provides a command-line program, unassign, that takes a FASTA file of DNA sequences in a 16S gene region, and gives the probability that the sequence is inconsistent with nearby bacterial species.

Installation

The Python library and command-line program can be installed using pip.

pip install unassigner

Besides the python libraries listed in the setup file, this program requires vsearch to be installed. This program is used to search for the closest matching bacterial species and return pairwise sequence alignments. It's available through conda, and this is our recommended method for installation.

conda create --name unassigner
conda activate unassigner
conda install -c bioconda vsearch
pip install unassigner

Alternative Installation

If pip install unassigner isn't working or if you want to use a development version, you can also install via git.

conda create --name unassigner
conda activate unassigner
conda install -c bioconda vsearch
git clone https://github.com/kylebittinger/unassigner.git
cd unassigner
pip install -r requirements.txt
pip install .

If you don't want to use conda, see the vsearch repo for alternative install methods.

Usage

The unassign program requires one argument, a FASTA-formatted file of short 16S sequences:

unassign my_sequences.fasta

If the program has not been run before, it will automatically download the bacterial species data it needs, format its reference files, create an output directory named my_sequences_unassigned, and write a table of results there, along with some auxiliary output files. Note that the output directory will be in the same directory as my_sequences.fasta.

Please see the output of unassign --help for a list of the available options.

Contributing

We welcome ideas from our users about how to improve this software. Please open an issue if you have a question or would like to suggest a feature.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unassigner-0.0.5.tar.gz (32.8 kB view details)

Uploaded Source

Built Distribution

unassigner-0.0.5-py3-none-any.whl (28.5 kB view details)

Uploaded Python 3

File details

Details for the file unassigner-0.0.5.tar.gz.

File metadata

  • Download URL: unassigner-0.0.5.tar.gz
  • Upload date:
  • Size: 32.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for unassigner-0.0.5.tar.gz
Algorithm Hash digest
SHA256 2f92021e99a1749db7e371b74beb9b4d504675c3278cae37f1a508c7f5907180
MD5 da543e461156246ca692e71694cc2749
BLAKE2b-256 fceb263b0854dbac7301646e8d802bc2f95aa269f82c04f6384bc5d7bc2623c8

See more details on using hashes here.

File details

Details for the file unassigner-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: unassigner-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 28.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for unassigner-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 6cc30fce9c46f316d160450b9ea15f5b340d6eec5fdc98fc7412c4da3c996799
MD5 b6f8b52bfabf0529743cbd0f12fe4aae
BLAKE2b-256 cadd5b811259c3e549443d3d0bd97ae75f8f4695129e5e2f28ec06595d96658c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page