Skip to main content

Automated annotation of engineered plasmids using sequence similarity searches

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

pLannotate-python

License: GPL v3 Python 3 DOI

pLannotate_logo

Automated annotation of engineered plasmids

pLannotate-python is a Python package for automatically annotating engineered plasmids using sequence similarity searches against curated databases. Fast, parallel processing with automatic database setup. All it is, is a python friendly wrapper around CLI tools. This means the CLI tools (and the databases they rely on) are required to be set up first.

Features

  • Fast, parallel annotation: Uses Diamond, BLAST, and Infernal concurrently
  • Multiple databases: Protein (fpbase, swissprot), nucleotide (snapgene), RNA (Rfam)
  • Circular plasmid support: Handles origin-crossing features
  • Automatic database setup: Downloads and configures databases (~900MB)
  • Flexible output: GenBank files, CSV reports, or pandas DataFrames

Installation

# Install with uv (recommended)
uv add plannotate-python

# Or with pip
pip install plannotate-python

External Tools Required

# macOS (Homebrew)
brew install diamond blast infernal ripgrep

# Linux (conda/mamba)
conda install -c bioconda diamond blast infernal ripgrep

# Ubuntu/Debian
sudo apt install diamond-aligner ncbi-blast+ infernal ripgrep

SSL Certificate Fix (macOS)

If you encounter SSL certificate errors during database download:

# Replace X.Y with your Python version (e.g., 3.11)
open "/Applications/Python X.Y/Install Certificates.command"

"Quick" Start

Automatic Database Setup:

import os
os.environ["PLANNOTATE_AUTO_DOWNLOAD"] = "1"  # Enable auto-download of databases
from plannotate.annotate import annotate

# First run will download databases (~900MB with progress bars)
>>> sequence="tgaccaggcatcaaataaaacgaaaggctcagtcgaaagactgggcctttcgttttatctgttgtttgtcggtgaacgctctctactagagtcacactggctcaccttcgggtgggcctttctgcgtttataggtctcaatccacgggtacgggtatggagaaacagtagagagttgcgataaaaagcgtcaggtagtatccgctaatcttatggataaaaatgctatggcatagcaaagtgtgacgccgtgcaaataatcaatgtggacttttctgccgtgattatagacacttttgttacgcgtttttgtcatggctttggtcccgctttgttacagaatgcttttaataagcggggttaccggtttggttagcgagaagagccagtaaaagacgcagtgacggcaatgtctgatgcaatatggacaattggtttcttgtaatcgttaatccgcaaataacgtaaaaacccgcttcggcgggtttttttatggggggagtttagggaaagagcatttgtcatttgtttatttttctaaatacattcaaatatgtatccgctcatgagacaataaccctgataaatgcttcaataatattgaaaaaggaagagtatgagtattcaacatttccgtgtcgcccttattcccttttttgcgg"  # Your plasmid sequence
>>> result = annotate(sequence, linear=False)  # False for circular plasmids

>>> result
   qstart  qend              sseqid   pident  slen                                               qseq  length  ...  wiggle  wstart  wend  kind  qstart_dup qend_dup fragment
0     523   615   AmpR_promoter_(5)  100.000    92  TTTGTTTATTTTTCTAAATACATTCAAATATGTATCCGCTCATGAG...      92  ...      13     536   601     1         523      614    False
1      11    83  rrnB_T1_terminator  100.000    72  CAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTA...      72  ...      10      21    72     1          11       82    False
2     155   440     araBAD_promoter   99.649   285  ATGGAGAAACAGTAGAGAGTTGCGATAAAAAGCGTCAGGTAGTATC...     285  ...      42     197   397     1         816     1100    False
3      98   126     T7Te_terminator  100.000    28                       GGCTCACCTTCGGGTGGGCCTTTCTGCG      28  ...       4     102   121     1          98      125    False
4     615   661                AmpR  100.000   861     ATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGG      46  ...       6     621   654     1         615      660     True

[5 rows x 28 columns]

Manual Database Setup:

from plannotate.resources import download_db
download_db()  # Downloads with progress bars and SSL error handling

Generate GenBank Files:

from plannotate.resources import get_gbk
gbk_content = get_gbk(result, sequence, is_linear=False)
with open("my_plasmid.gbk", "w") as f:
    f.write(gbk_content)

Configuration

Environment Variables:

  • PLANNOTATE_AUTO_DOWNLOAD=1 - Auto-download databases without prompting
  • PLANNOTATE_DB_DIR=/path - Custom database directory
  • PLANNOTATE_SKIP_DB_DOWNLOAD=1 - Skip database downloads entirely

Core Functions:

  • annotate(sequence, linear=False) - Annotate DNA sequence
  • get_gbk(annotations, sequence) - Generate GenBank file
  • download_db() - Download databases with progress bars

Troubleshooting

SSL Certificate Errors: Run the SSL certificate fix command above Empty Results: Sequence may not match database features
Tool Errors: Ensure external tools are installed and in PATH

Citation

If you use pLannotate-python in your research, please cite the original pLannotate paper:

McGuffin, M.J., Thiel, M.C., Pineda, D.L. et al. pLannotate: automated annotation of engineered plasmids. Nucleic Acids Research (2021).

License

This project is licensed under the GPL v3 License - see the LICENSE file for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plannotate_python-1.2.9.tar.gz (27.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

plannotate_python-1.2.9-py3-none-any.whl (27.2 MB view details)

Uploaded Python 3

File details

Details for the file plannotate_python-1.2.9.tar.gz.

File metadata

  • Download URL: plannotate_python-1.2.9.tar.gz
  • Upload date:
  • Size: 27.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.0

File hashes

Hashes for plannotate_python-1.2.9.tar.gz
Algorithm Hash digest
SHA256 a4afb67947e052f429a739fc4ea1aaa6cd1c599ec15907b704b70a2de9db53c6
MD5 215d03ee83b6a8115b83a65ab04c9490
BLAKE2b-256 2053f35544cb119ffd39f079ebe490346cbf478d5e0cab8bec639424ab395a2a

See more details on using hashes here.

File details

Details for the file plannotate_python-1.2.9-py3-none-any.whl.

File metadata

File hashes

Hashes for plannotate_python-1.2.9-py3-none-any.whl
Algorithm Hash digest
SHA256 6ae5b3fe3b8b69c01a8f855a83eeabf5ccd1ba769defb89919e6ea0bb9314513
MD5 0e2306bd1dffb21955c5d9e25e035526
BLAKE2b-256 93772a0ded71d53a44dff9cf95179949b9dc4728738d47524c4c49ce508518be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page