Skip to main content

Automated annotation of engineered plasmids using sequence similarity searches

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

pLannotate-python

License: GPL v3 Python 3 DOI

pLannotate_logo

Automated annotation of engineered plasmids

pLannotate-python is a Python package for automatically annotating engineered plasmids using sequence similarity searches against curated databases. Fast, parallel processing with automatic database setup. All it is, is a python friendly wrapper around CLI tools. This means the CLI tools (and the databases they rely on) are required to be set up first.

Features

  • Fast, parallel annotation: Uses Diamond, BLAST, and Infernal concurrently
  • Multiple databases: Protein (fpbase, swissprot), nucleotide (snapgene), RNA (Rfam)
  • Circular plasmid support: Handles origin-crossing features
  • Automatic database setup: Downloads and configures databases (~900MB)
  • Flexible output: GenBank files, CSV reports, or pandas DataFrames

Installation

# Install with uv (recommended)
uv add plannotate-python

# Or with pip
pip install plannotate-python

External Tools Required

# macOS (Homebrew)
brew install diamond blast infernal ripgrep

# Linux (conda/mamba)
conda install -c bioconda diamond blast infernal ripgrep

# Ubuntu/Debian
sudo apt install diamond-aligner ncbi-blast+ infernal ripgrep

SSL Certificate Fix (macOS)

If you encounter SSL certificate errors during database download:

# Replace X.Y with your Python version (e.g., 3.11)
open "/Applications/Python X.Y/Install Certificates.command"

"Quick" Start

Automatic Database Setup:

import os
os.environ["PLANNOTATE_AUTO_DOWNLOAD"] = "1"  # Enable auto-download of databases
from plannotate.annotate import annotate

# First run will download databases (~900MB with progress bars)
>>> sequence="tgaccaggcatcaaataaaacgaaaggctcagtcgaaagactgggcctttcgttttatctgttgtttgtcggtgaacgctctctactagagtcacactggctcaccttcgggtgggcctttctgcgtttataggtctcaatccacgggtacgggtatggagaaacagtagagagttgcgataaaaagcgtcaggtagtatccgctaatcttatggataaaaatgctatggcatagcaaagtgtgacgccgtgcaaataatcaatgtggacttttctgccgtgattatagacacttttgttacgcgtttttgtcatggctttggtcccgctttgttacagaatgcttttaataagcggggttaccggtttggttagcgagaagagccagtaaaagacgcagtgacggcaatgtctgatgcaatatggacaattggtttcttgtaatcgttaatccgcaaataacgtaaaaacccgcttcggcgggtttttttatggggggagtttagggaaagagcatttgtcatttgtttatttttctaaatacattcaaatatgtatccgctcatgagacaataaccctgataaatgcttcaataatattgaaaaaggaagagtatgagtattcaacatttccgtgtcgcccttattcccttttttgcgg"  # Your plasmid sequence
>>> result = annotate(sequence, linear=False)  # False for circular plasmids

>>> result
   qstart  qend              sseqid   pident  slen                                               qseq  length  ...  wiggle  wstart  wend  kind  qstart_dup qend_dup fragment
0     523   615   AmpR_promoter_(5)  100.000    92  TTTGTTTATTTTTCTAAATACATTCAAATATGTATCCGCTCATGAG...      92  ...      13     536   601     1         523      614    False
1      11    83  rrnB_T1_terminator  100.000    72  CAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTA...      72  ...      10      21    72     1          11       82    False
2     155   440     araBAD_promoter   99.649   285  ATGGAGAAACAGTAGAGAGTTGCGATAAAAAGCGTCAGGTAGTATC...     285  ...      42     197   397     1         816     1100    False
3      98   126     T7Te_terminator  100.000    28                       GGCTCACCTTCGGGTGGGCCTTTCTGCG      28  ...       4     102   121     1          98      125    False
4     615   661                AmpR  100.000   861     ATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGG      46  ...       6     621   654     1         615      660     True

[5 rows x 28 columns]

Manual Database Setup:

from plannotate.resources import download_db
download_db()  # Downloads with progress bars and SSL error handling

Generate GenBank Files:

from plannotate.resources import get_gbk
gbk_content = get_gbk(result, sequence, is_linear=False)
with open("my_plasmid.gbk", "w") as f:
    f.write(gbk_content)

Configuration

Environment Variables:

  • PLANNOTATE_AUTO_DOWNLOAD=1 - Auto-download databases without prompting
  • PLANNOTATE_DB_DIR=/path - Custom database directory
  • PLANNOTATE_SKIP_DB_DOWNLOAD=1 - Skip database downloads entirely

Core Functions:

  • annotate(sequence, linear=False) - Annotate DNA sequence
  • get_gbk(annotations, sequence) - Generate GenBank file
  • download_db() - Download databases with progress bars

Troubleshooting

SSL Certificate Errors: Run the SSL certificate fix command above Empty Results: Sequence may not match database features
Tool Errors: Ensure external tools are installed and in PATH

Citation

If you use pLannotate-python in your research, please cite the original pLannotate paper:

McGuffin, M.J., Thiel, M.C., Pineda, D.L. et al. pLannotate: automated annotation of engineered plasmids. Nucleic Acids Research (2021).

License

This project is licensed under the GPL v3 License - see the LICENSE file for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plannotate_python-1.2.8.tar.gz (27.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

plannotate_python-1.2.8-py3-none-any.whl (27.2 MB view details)

Uploaded Python 3

File details

Details for the file plannotate_python-1.2.8.tar.gz.

File metadata

  • Download URL: plannotate_python-1.2.8.tar.gz
  • Upload date:
  • Size: 27.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.0

File hashes

Hashes for plannotate_python-1.2.8.tar.gz
Algorithm Hash digest
SHA256 57913e27892fe71addf5ad0994a7888afeba13e8c64f792ccbca2c0326cef7c6
MD5 2d904f8026cabf720519f35a06f7eac6
BLAKE2b-256 da62b93b9815b66b7db04fb534e002f620681fc6e258ea5cb335f72628d8a41d

See more details on using hashes here.

File details

Details for the file plannotate_python-1.2.8-py3-none-any.whl.

File metadata

File hashes

Hashes for plannotate_python-1.2.8-py3-none-any.whl
Algorithm Hash digest
SHA256 e149cfcf56b31e48cf9dbb4f8f4803aa445d2078eef30880fd69df05adaa7023
MD5 ebd3b8155f1d28a4b4b08ea1362358dd
BLAKE2b-256 e1816a946bef5dec4d8f40c488a9bd1850c037b282fa1918fe40083b1bd3d54e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page