Skip to main content

RFPlasmid Predicting plasmid contigs

Project description

RFPlasmid Predicting plasmid contigs from assemblies using single copy marker genes, plasmid genes, kmers

Linda van der Graaf-van Bloois, Jaap Wagenaar, Aldert Zomer

Introduction: Antimicrobial resistant (AMR) genes in bacteria are often carried on plasmids. Since these plasmids can spread the AMR genes between bacteria, it is important to know if the genes are located on highly transferable plasmids or in the more stable chromosomes. Whole genome sequence (WGS) analysis makes it easy to determine if a strain contains a resistance gene, however, it is not easy to determine if the gene is located on the chromosome or on a plasmid as genome sequence assembly generally results in 50-300 DNA fragments (contigs). With our newly developed prediction tool, we analyze the composition of these contigs to predict their likely source, plasmid or chromosomal. This information can be used to determine if a resistant gene is chromosomally located or on a plasmid. The tool is optimized for 19 different bacterial species, including Campylobacter, E. coli, and Salmonella, and can also be used for metagenomic assemblies.

Methods: The tool identifies the number of chromosomal marker genes, plasmid replication genes and plasmid typing genes using CheckM and DIAMOND Blast, and determines pentamer frequencies and contig sizes per contig. A prediction model was trained using Random Forest on an extensive set of plasmids and chromosomes from 19 different bacterial species and validated on separate test sets of known chromosomal and plasmid contigs of the different bacteria. Results: Prediction of plasmid contigs was nearly perfect when calculated based on number of correctly predicted bases, with up to 99% specificity and 99% sensitivity. Prediction of small contigs remains difficult, since these contigs consists primarily of repeated sequences present in both plasmid and chromosome, e.g. transposases.

Conclusion: The newly developed tool is able to determine if contigs are chromosomal or plasmid with a very high specificity and sensitivity (up to 99%) and can be very useful to analyze WGS data of bacterial genomes and their antimicrobial resistance genes.

A web-interface to test single fasta files is available here: http://klif.uu.nl/rfplasmid/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rfplasmid-0.0.2.tar.gz (52.4 MB view details)

Uploaded Source

Built Distribution

rfplasmid-0.0.2-py3-none-any.whl (52.4 MB view details)

Uploaded Python 3

File details

Details for the file rfplasmid-0.0.2.tar.gz.

File metadata

  • Download URL: rfplasmid-0.0.2.tar.gz
  • Upload date:
  • Size: 52.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.9.1 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.5.2

File hashes

Hashes for rfplasmid-0.0.2.tar.gz
Algorithm Hash digest
SHA256 c9640c8c83c9ab4bdd27deb4fa297f371bd0bb6a07476547defc18556fdc9338
MD5 bdbf22bd44c5e312aade65013da836b2
BLAKE2b-256 0694e2c4f97cf86b28486f08cd33b156daaeae9e5fa4324168545f912c8427be

See more details on using hashes here.

File details

Details for the file rfplasmid-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: rfplasmid-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 52.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.9.1 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.5.2

File hashes

Hashes for rfplasmid-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f3b540e8153373dafcaa6e9610d0c389982cb66d310bdcb970dbb6a29d5209c4
MD5 62e71a52f24a1d8fde0deaefab9b6041
BLAKE2b-256 1640f0a43ef4869ad191b489ce1b01322619ef94f4f08009d8b4d5834e2f3fa6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page