Skip to main content

A tool to detect plasmids and contamination in bacterial and archaeal genome assemblies

Project description

PlasIDome

A tool to detect plasmids and contamination in bacterial and archaeal genome assemblies

Introduction

In bacterial genome assembly, it is important to account for all of the contigs. PlasIDome finds all contigs in an assembly file that are less than a specified length (default: 200,000bp) and uses sequences homology to categorize each as chromosomal, plasmid, or contamination. With this tool a key step in assessing assembly quality has been automated and can be integrated into existing workflows.

Getting Started

Requirements

  • python 3.7+
  • blast+ 2.10.1+
  • bioseq

Installation

PlasIDome is available on PYPI and can be installed using pip pip install plasIDome

Usage

As input, PlasIDome takes a genome assembly file in fasta format (.fasta, .fa, .fna).

Example:

plasidome -b path/to/blastn -f genome.fasta

Examine contigs up to 50,000 bp in length:

plasidome -b path/to/blastn -f genome.fasta -l 50000

Extensive Usage:

plasidome -b path/to/blastn -f genome.fasta -p path/to/output -o output_name

Output Files

PlasIDome generates three outputs:

  • report.tsv
  • alignment_results.tsv
  • directory of contigs

The report.tsv file contains a summary of the alignment results in table format. From left to right, the table includes the contig name, its classification, and its contamination status followed by the number of matches that were to chromosomal, plasmid, undetermined, or human sequences in the database.

A file called alignment_results.tsv is created that contains the raw, uninterpreted alignment results in table format. From left to right, the table includes the contig name, subject name, subject taxonomic ID, the percent sequence identity, query coverage, qcovhsp, alignment length, and e-value. The raw data allows the investigator to parse the alignment results manually. The alignment results written to this file are limited to homology matches where the subject and query shared at least 95% sequence identity and the subject covered at least 95% of the length of the length of the query sequence.

FASTA files with each contig sequence are saved to a subdirectory called single_contigs so the user can re-align any sequence without having to find and isolate the contig themselves.

At this time, PlasIDome is not able to determine if a plasmid is complete or if multiple contigs are fragments of the same plasmid. PlasIDome also does not comment on the taxonomic identification of the species. If the user wants to see the taxonomic makeup of the sample, it is best to review the raw alignment results in the alignment_results.tsv file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plasIDome-1.0.0.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

plasIDome-1.0.0-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file plasIDome-1.0.0.tar.gz.

File metadata

  • Download URL: plasIDome-1.0.0.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.5

File hashes

Hashes for plasIDome-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3447df8fa8ae53c2d091f0d4fb070ac0191ee8a50576771f19d8acf4be01883c
MD5 768da4cf85fbc83f699eba6037076c0a
BLAKE2b-256 974e0d0857a6d7f24abd33a6c1b660da58ca8f9382afdb3e3e994b2b6a62b4b3

See more details on using hashes here.

File details

Details for the file plasIDome-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: plasIDome-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 6.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.5

File hashes

Hashes for plasIDome-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 295b27c6a55e80c8a4b53016fc202e86bad39e7a06cfb8ae1bb721f007f23dfc
MD5 d46a0392bcb8cca771f58ce0eb25372b
BLAKE2b-256 06f225d3869f92dda22c2b466ce7ae41e410e8022c6e4f38f59b27fd541e8a64

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page