A Python wrapper for the DIAMOND bioinformatics tool

Project description

💎 Diamononpy 🐍

A Python wrapper for the ultra-fast DIAMOND sequence alignment tool. This package provides a clean, Pythonic API for DIAMOND's powerful sequence search capabilities with seamless pandas integration for efficient bioinformatics data analysis and processing. Perfect for researchers and bioinformaticians working with large genomic datasets who need both speed and ease of use.

✨ Features

🚀 Full support for all DIAMOND V commands
📊 Results returned as pandas DataFrames for easy analysis
🗑️ Automatic temporary file management
🔍 Type hints for better IDE support
🧪 Comprehensive test suite
📦 Minimal dependencies (only pandas and numpy)

📥 Installation

First, ensure you have DIAMOND installed and accessible in your PATH. Then install this package:

pip install diamononpy

For installation directly from the GitHub repository, use the following command:

pip install git+https://github.com/EnzoAndree/diamondonpy.git

For development installation (including test dependencies):

pip install -e ".[dev]"

🚀 Usage

Basic Usage

from diamononpy import Diamond

# Initialize the wrapper
diamond = Diamond()

# Create a database
diamond.makedb(
    db="mydb.dmnd",
    input_file="sequences.fasta",
    threads=4
)

# Run BLASTP search - results as DataFrame
results_df = diamond.blastp(
    db="mydb.dmnd",
    query="query.fasta",
    evalue=1e-10,
    threads=4
)

# Access results using pandas
print(results_df.head())
print(f"Found {len(results_df)} hits")
print(f"Average identity: {results_df['pident'].mean():.2f}%")

# Filter results
significant_hits = results_df[
    (results_df['evalue'] < 1e-30) & 
    (results_df['pident'] > 90)
]

📊 Working with Results

All BLAST-like commands (blastp, blastx) return pandas DataFrames with the following columns:

qseqid: Query sequence identifier
sseqid: Subject sequence identifier
pident: Percentage of identical matches
length: Alignment length
mismatch: Number of mismatches
gapopen: Number of gap openings
qstart: Start of alignment in query
qend: End of alignment in query
sstart: Start of alignment in subject
send: End of alignment in subject
evalue: Expect value
bitscore: Bit score

# BLASTP with output file
results_df = diamond.blastp(
    db="mydb.dmnd",
    query="query.fasta",
    out="results.txt",  # Optional: save to file
    evalue=1e-10
)

# Clustering with results as DataFrame
clusters_df = diamond.cluster(
    db="mydb.dmnd",
    approx_id=90.0
)
print(clusters_df.head())

# Bidirectional Best Hit analysis
bbh_df = diamond.bidirectional_best_hit(
    db1="db1.dmnd",
    db2="db2.dmnd",
    evalue=1e-10
)
print(bbh_df.head())

🛠️ Available Commands

All major DIAMOND commands are supported with enhanced result handling:

makedb: Build DIAMOND database from a FASTA file
blastp: Align protein sequences (returns DataFrame)
blastx: Align DNA sequences (returns DataFrame)
view: View DAA files (returns DataFrame for tabular output)
cluster: Cluster sequences (returns DataFrame)
linclust: Linear-time clustering (returns DataFrame)
getseq: Retrieve sequences
dbinfo: Database information
bidirectional_best_hit: Perform bidirectional best hit analysis between two databases

🧠 Advanced Features

Custom Output Formats

# Custom BLAST output format
results_df = diamond.blastp(
    db="mydb.dmnd",
    query="query.fasta",
    outfmt="6 qseqid sseqid pident evalue bitscore qcovhsp"
)

# Non-tabular output
text_output = diamond.view(
    daa="alignment.daa",
    outfmt=0  # BLAST pairwise format
)

🧹 Temporary File Management

The package automatically manages temporary files:

with Diamond() as diamond:
    results_df = diamond.blastp(
        db="mydb.dmnd",
        query="query.fasta"
    )
    # Temporary files are automatically cleaned up

🔄 Cluster Analysis

Analyze clustering results with built-in parser:

# Perform clustering
clusters_df = diamond.cluster(
    db="mydb.dmnd",
    approx_id=90.0
)

# Data contains cluster IDs and members
print(f"Number of clusters: {clusters_df['cluster_id'].nunique()}")

📋 Requirements

Python ≥ 3.6
DIAMOND (must be installed separately and accessible in PATH)
pandas ≥ 1.0.0
numpy ≥ 1.18.0

🧪 Development

First, install development dependencies:

pip install -e ".[dev]"

To run tests:

# Run basic tests
pytest

# Run tests with coverage report
pytest --cov=diamononpy

# Run tests with detailed coverage report
pytest --cov=diamononpy --cov-report=term-missing

# Run tests verbosely
pytest -v

# Run a specific test file
pytest tests/test_diamond.py

# Run a specific test function
pytest tests/test_diamond.py::test_blastp

📚 References

This package is a wrapper for the DIAMOND bioinformatics tool.

When using DIAMOND in published research, please cite:

Buchfink B, Reuter K, Drost HG, "Sensitive protein alignments at tree-of-life scale using DIAMOND", Nature Methods 18, 366–368 (2021). doi:10.1038/s41592-021-01101-x

For sequence clustering:

Buchfink B, Ashkenazy H, Reuter K, Kennedy JA, Drost HG, "Sensitive clustering of protein sequences at tree-of-life scale using DIAMOND DeepClust", bioRxiv 2023.01.24.525373; doi: doi:10.1101/2023.01.24.525373

Original publication to cite DIAMOND until v0.9.25:

Buchfink B, Xie C, Huson DH, "Fast and sensitive protein alignment using DIAMOND", Nature Methods 12, 59-60 (2015). doi:10.1038/nmeth.3176

📄 License

This project is licensed under the GNU General Public License v3 (GPLv3) - see the LICENSE file for details.

Project details

Release history Release notifications | RSS feed

This version

0.1.1

Feb 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diamondonpy-0.1.1.tar.gz (33.6 kB view details)

Uploaded Feb 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

diamondonpy-0.1.1-py3-none-any.whl (33.7 kB view details)

Uploaded Feb 28, 2025 Python 3

File details

Details for the file diamondonpy-0.1.1.tar.gz.

File metadata

Download URL: diamondonpy-0.1.1.tar.gz
Upload date: Feb 28, 2025
Size: 33.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for diamondonpy-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`38c88155b7075271755f982bb8ddac5986a01f0a1199abfb10e69d2ef30a07f1`
MD5	`6edbb187593de5e53a0e33c20d260642`
BLAKE2b-256	`7a57a0b49798719bb26c2528163a2c016ba46a5fe74aff90aad8f84699d2c38d`

See more details on using hashes here.

File details

Details for the file diamondonpy-0.1.1-py3-none-any.whl.

File metadata

Download URL: diamondonpy-0.1.1-py3-none-any.whl
Upload date: Feb 28, 2025
Size: 33.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for diamondonpy-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0bae7fded5d80c5a7b44c13868ee8a245b270b3200419098caf8a906a147397c`
MD5	`03199c0c028db5b48926d9468d86fc69`
BLAKE2b-256	`a38210eac9aab14ed0dc2c761e50e9e1b585cd5af393c8256675926665e9227a`

See more details on using hashes here.

diamondonpy 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

💎 Diamononpy 🐍

✨ Features

📥 Installation

🚀 Usage

Basic Usage

📊 Working with Results

🛠️ Available Commands

🧠 Advanced Features

Custom Output Formats

🧹 Temporary File Management

🔄 Cluster Analysis

📋 Requirements

🧪 Development

📚 References

📄 License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes