Automated annotation of engineered plasmids using sequence similarity searches

These details have not been verified by PyPI

Project links

Project description

pLannotate

Python 3

Automated annotation of engineered plasmids

pLannotate is a Python package for automatically annotating engineered plasmids using sequence similarity searches against curated databases. This is a streamlined, installable version of the original pLannotate tool, designed for programmatic use and integration into bioinformatics workflows.

Features

Fast annotation: Uses Diamond, BLAST, and Infernal for comprehensive sequence searches
Multiple databases: Search against protein (fpbase, swissprot), nucleotide (snapgene), and RNA (Rfam) databases
Circular plasmid support: Handles origin-crossing features in circular plasmids
Flexible output: Generate GenBank files, CSV reports, or work with pandas DataFrames
Batch processing: Annotate multiple plasmids programmatically

Installation

1. Install pLannotate

# Install from PyPI (when available)
pip install plannotate

# Or install from source
git clone https://github.com/McClain-Thiel/pLannotate.git
cd pLannotate
pip install -e .

# Or install with uv (recommended)
uv add .

2. Install External Tools

pLannotate requires external bioinformatics tools for sequence searching:

On macOS (using Homebrew)

# Install Homebrew if not already installed
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install bioinformatics tools
brew install diamond
brew install blast
brew install infernal

On Linux (using Conda/Mamba)

# Install conda/mamba if not already installed
# Then install bioinformatics tools
conda install -c bioconda diamond blast infernal

# Or with mamba (faster)
mamba install -c bioconda diamond blast infernal

On Linux (using package managers)

Ubuntu/Debian:

sudo apt update
sudo apt install diamond-aligner ncbi-blast+ infernal

CentOS/RHEL/Fedora:

# Install EPEL repository first
sudo yum install epel-release
sudo yum install diamond ncbi-blast+ infernal

3. Verify Installation

# Check that tools are installed
diamond version
blastn -version
cmscan -h

# Test pLannotate import
python -c "from plannotate.annotate import annotate; print('✓ pLannotate installed successfully')"

Quick Start

Basic Usage

from plannotate.annotate import annotate

# Annotate a plasmid sequence
sequence = "ATGGTGAGCAAGGGCGAGGAGCTG..."  # Your plasmid sequence
result = annotate(sequence, linear=False)  # False for circular plasmids

# View results
print(f"Found {len(result)} annotations")
print(result[['Feature', 'Type', 'qstart', 'qend', 'pident']].head())

Generate GenBank File

from plannotate.annotate import annotate
from plannotate.resources import get_gbk

# Annotate sequence
sequence = "ATGGTGAGCAAGGGCGAGGAGCTG..."
annotations = annotate(sequence, linear=False)

# Generate GenBank file
gbk_content = get_gbk(annotations, sequence, is_linear=False)

# Save to file
with open("my_plasmid.gbk", "w") as f:
    f.write(gbk_content)

Working with Sample Plasmids

from pathlib import Path
from Bio import SeqIO
from plannotate.annotate import annotate

# Use included sample plasmids
sample_dir = Path("plannotate/data/fastas")
for fasta_file in sample_dir.glob("*.fa"):
    # Load sequence
    record = list(SeqIO.parse(fasta_file, "fasta"))[0]
    sequence = str(record.seq)
    
    # Annotate
    result = annotate(sequence, linear=False)
    print(f"{fasta_file.name}: {len(result)} features found")

Database Setup

For full functionality, you need to set up sequence databases:

1. Download/Create Databases

Protein Databases (Diamond format):

fpbase: Fluorescent proteins database
swissprot: SwissProt protein database

Nucleotide Databases (BLAST format):

snapgene: Common cloning features

RNA Databases (Infernal format):

Rfam: RNA families database

2. Example Database Setup

# Create database directory
mkdir -p databases

# Example: Create fpbase diamond database
# (You need to obtain the fpbase protein sequences)
diamond makedb --in fpbase.fasta --db databases/fpbase

# Example: Create BLAST nucleotide database
makeblastdb -in snapgene.fasta -dbtype nucl -out databases/snapgene

# Example: Download and prepare Rfam (large download ~2GB)
wget ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/Rfam.cm.gz
gunzip Rfam.cm.gz
mv Rfam.cm databases/

3. Update Database Configuration

Edit plannotate/data/data/databases.yml to point to your database files:

fpbase:
  method: diamond
  location: /path/to/your/databases/fpbase.dmnd
  priority: 1
  # ... other settings

snapgene:
  method: blastn
  location: /path/to/your/databases/snapgene
  priority: 1
  # ... other settings

Advanced Usage

Custom Database Configuration

# Use custom database configuration
custom_config = "my_databases.yml"
result = annotate(sequence, yaml_file=custom_config, linear=False)

Batch Processing

import pandas as pd
from plannotate.annotate import annotate

sequences = {
    "plasmid1": "ATGGTGAGCAAG...",
    "plasmid2": "ATGGTGAGCAAG...",
    # ... more sequences
}

results = []
for name, seq in sequences.items():
    annotations = annotate(seq, linear=False)
    annotations['plasmid_name'] = name
    results.append(annotations)

# Combine all results
all_annotations = pd.concat(results, ignore_index=True)
all_annotations.to_csv("batch_annotations.csv", index=False)

Filter Results

# Get only CDS features with high identity
cds_features = result[
    (result['Type'] == 'CDS') & 
    (result['pident'] > 90)
]

# Get features above a certain score threshold
high_score_features = result[result['score'] > 100]

API Reference

Core Functions

`annotate(sequence, yaml_file=None, linear=False, is_detailed=False)`

Annotate a DNA sequence.

Parameters:

sequence (str): DNA sequence to annotate
yaml_file (str, optional): Path to database configuration file
linear (bool): True for linear DNA, False for circular plasmids
is_detailed (bool): Include detailed feature information

Returns:

pandas.DataFrame: Annotation results

`get_gbk(annotations_df, sequence, is_linear=False, record=None)`

Generate GenBank format output.

Parameters:

annotations_df (DataFrame): Annotation results from annotate()
sequence (str): Original DNA sequence
is_linear (bool): True for linear DNA, False for circular
record (SeqRecord, optional): Existing SeqRecord to annotate

Returns:

str: GenBank formatted text

DataFrame Columns

The annotation results DataFrame contains these key columns:

Feature: Feature name/description
Type: Feature type (CDS, misc_feature, etc.)
qstart, qend: Start and end positions (0-based)
pident: Percent identity
length: Feature length
score: Annotation confidence score
fragment: Boolean indicating if feature is truncated
db: Source database

Troubleshooting

Common Issues

"Tool not found in PATH"

# Ensure tools are installed and accessible
which diamond blastn cmscan

# If using conda, activate the environment
conda activate your_environment

"No such file or directory" for databases

Verify database paths in databases.yml
Ensure database files exist and have correct permissions
Check that Diamond databases have .dmnd extension

Empty results

Sequence may not have matches in current databases
Try lowering identity thresholds in database parameters
Verify databases contain relevant sequences for your plasmids

Performance Tips

Use smaller, curated databases for faster searches
Adjust database parameters (identity thresholds, max targets)
For batch processing, consider parallel execution

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Add tests for new functionality
Submit a pull request

Citation

If you use pLannotate in your research, please cite:

Barrick Lab. pLannotate: automated annotation of engineered plasmids. Nucleic Acids Research (2021).

License

This project is licensed under the GPL v3 License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.2.9

Sep 9, 2025

1.2.8

Sep 9, 2025

1.2.7

Sep 6, 2025

1.2.6

Sep 6, 2025

1.2.5

Sep 6, 2025

1.2.4

Sep 5, 2025

This version

1.2.3

Sep 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plannotate_python-1.2.3.tar.gz (27.3 MB view details)

Uploaded Sep 5, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

plannotate_python-1.2.3-py3-none-any.whl (27.2 MB view details)

Uploaded Sep 5, 2025 Python 3

File details

Details for the file plannotate_python-1.2.3.tar.gz.

File metadata

Download URL: plannotate_python-1.2.3.tar.gz
Upload date: Sep 5, 2025
Size: 27.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.8

File hashes

Hashes for plannotate_python-1.2.3.tar.gz
Algorithm	Hash digest
SHA256	`c40f4d5a6c1ff21da9d2eb6f393621f457e5d7388489a1caef163898d6cfe05a`
MD5	`96ad10424626b6d334da60110786b1e3`
BLAKE2b-256	`a4bcf90b8f234168399b09b02c3c9af64e19537785d83cd6ebcc4c2f8756c49b`

See more details on using hashes here.

File details

Details for the file plannotate_python-1.2.3-py3-none-any.whl.

File metadata

Download URL: plannotate_python-1.2.3-py3-none-any.whl
Upload date: Sep 5, 2025
Size: 27.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.8

File hashes

Hashes for plannotate_python-1.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`891a328884be3f77f0e1dbbdaf68b726700b8a6b37a2a50e03b412eccc173004`
MD5	`b0dfc375fcc72ac49febf16a99065385`
BLAKE2b-256	`8b2f54266d6459a94c61ad949d09c6041a84c52bb5aa13db2e6065b732b4869a`

See more details on using hashes here.

plannotate-python 1.2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pLannotate

Features

Installation

1. Install pLannotate

2. Install External Tools

On macOS (using Homebrew)

On Linux (using Conda/Mamba)

On Linux (using package managers)

3. Verify Installation

Quick Start

Basic Usage

Generate GenBank File

Working with Sample Plasmids

Database Setup

1. Download/Create Databases

2. Example Database Setup

3. Update Database Configuration

Advanced Usage

Custom Database Configuration

Batch Processing

Filter Results

API Reference

Core Functions

annotate(sequence, yaml_file=None, linear=False, is_detailed=False)

get_gbk(annotations_df, sequence, is_linear=False, record=None)

DataFrame Columns

Troubleshooting

Common Issues

Performance Tips

Contributing

Citation

License

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`annotate(sequence, yaml_file=None, linear=False, is_detailed=False)`

`get_gbk(annotations_df, sequence, is_linear=False, record=None)`