Bacterial species unassigner
Project description
Unassigner
Evaluate consistency with named bacterial species for short 16S rRNA marker gene sequences.
Summary
The 16S rRNA gene is found in all bacteria, and its gene sequence is highly conserved. Amplification and sequencing of bacterial 16S rRNA genes is a common method used to survey bacterial communities in microbiome research. However, high throughput instruments are unable to sequence the entire gene. Therefore, a short region of the gene is selected for amplification and sequencing.
The resultant sequences, spanning part of the 16S gene, can be used to identify the types of bacteria present in a specimen. For example, one sequence might be assigned to the Streptococcus genus based on sequence similarity. Many programs are available to carry out such taxonomic assignment.
It is generally thought that the 16S rRNA gene is not suitable for assignment of bacterial species. We agree, but with a catch: the gene sequence is suitable for ruling out assignment to many bacterial species. This software is designed to rule out all the species designations that are inconsistent with a partial 16S rRNA gene sequence. For those species that are not definitively ruled out, we assign a probability that the sequence is inconsistent with the species.
Because the software is geared towards ruling out species rather than deciding on the best assignment, we call it the unassigner. It's a cheesy joke, but we've decided to roll with it.
The unassigner library provides a command-line program, unassign
,
that takes a FASTA file of DNA sequences in a 16S gene region, and
gives the probability that the sequence is inconsistent with nearby
bacterial species.
Installation
The Python library and command-line program can be installed using pip.
pip install unassigner
Besides the python libraries listed in the setup file, this program
requires wget
and vsearch
to be installed. The wget
program is
used to download data on bacterial species, the first time unassign
is run. The program vsearch
is used to search for the closest
matching bacterial species and return pairwise sequence alignments.
Usage
The unassign
program requires one argument, a FASTA-formatted file
of short 16S sequences:
unassign my_sequences.fasta
If the program has not been run before, it will automatically download
the bacterial species data it needs, format its reference files,
create an output directory named my_sequences_unassigned
, and write
a table of results there, along with some auxiliary output files.
Please see the output of unassign --help
for a list of the available
options.
Contributing
We welcome ideas from our users about how to improve this software. Please open an issue if you have a question or would like to suggest a feature.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file unassigner-0.0.3.tar.gz
.
File metadata
- Download URL: unassigner-0.0.3.tar.gz
- Upload date:
- Size: 16.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.21.0 setuptools/40.4.2 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1988ee8dd40782d9e73b4a2f7c818495938ad8e28de87add48bd94fd5d29aaf1 |
|
MD5 | 030d943bdef88967bf132b9d934ef206 |
|
BLAKE2b-256 | 85d8215ec0e20422f889c867546f6f8439ed6c934464835ad9a1ad56da21d8cb |
File details
Details for the file unassigner-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: unassigner-0.0.3-py3-none-any.whl
- Upload date:
- Size: 18.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.21.0 setuptools/40.4.2 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f2f8c7c73de818aa4631d7c07ec0f7167a2e0fba3ac9ca0e85c58117fba0933c |
|
MD5 | c751ba215e7deb9ffe4a99bff78fa617 |
|
BLAKE2b-256 | fd6903257831b88c0e21ad74d83c3f4af2f2544cfa9b20989a4419746c38f596 |