Skip to main content

SNPraefentia: SNP Prioritization Tool

Project description

SNPraefentia: SNP Prioritization from Metagenomic Variants

SNPraefentia is a comprehensive tool for prioritizing Single Nucleotide Polymorphisms (SNPs) in metagenomic variants. It helps in identifying potentially significant variants by analyzing sequencing depth, amino acid changes, and protein domain information.

Installation

Setting up the conda environment

First, create a new conda environment with Python 3.11:

# Create a new conda environment
conda create -y -n snpraefentia python=3.11

# Activate the environment
conda activate snpraefentia

Installing SNPraefentia

With the conda environment activated, install SNPraefentia using pip:

# Install from PyPI
pip install snpraefentia

Or you can install from source:

# Clone the repository
git clone https://github.com/muneebdev7/SNPraefentia.git
cd SNPraefentia

# Install package
pip install .

# Install in development mode (if you want to modify the code)
pip install -e .

Dependencies

SNPraefentia requires the following Python packages (see requirements.txt):

  • pandas (≥1.0.0)
  • numpy (≥1.18.0)
  • requests (≥2.22.0)
  • ete3 (≥3.1.1)
  • openpyxl (≥3.1.5)
  • matplotlib (≥3.0.0)
  • seaborn (≥0.10.0)
  • adjustText (≥0.7.3)
  • rich (>=13.0.0)

These dependencies will be installed automatically when using pip.

First-time Setup

NCBI Taxonomy Database

On first use, SNPraefentia needs to download the NCBI taxonomy database. This is a one-time process that requires approximately 600+ MB of disk space:

# This happens automatically on first use, but might take a few minutes
# You can also trigger it manually before running SNPraefentia:
python -c "from ete3 import NCBITaxa; ncbi = NCBITaxa()"

The database will be stored in ~/.etetoolkit/ by default.

Verifying Installation

To verify that the package is installed correctly:

snpraefentia --version

This should display the current version (2.0.0).

Usage

Command Line Interface

Basic Usage

# For CSV input/output
snpraefentia --input your_data.csv --specie "Bacteroides uniformis" --output results.csv

# For Excel input/output
snpraefentia --input your_data.xlsx --specie "Bacteroides uniformis" --output results.xlsx

Required Arguments

  • --input, -i: Path to input file containing SNP data (CSV or Excel: .csv, .xlsx)
  • --specie, -s: Bacterial species name (e.g., 'Bacteroides uniformis')
  • --output, -o: Path to save output file (CSV or Excel: .csv, .xlsx)

Optional Arguments

  • --format, -f: Output format override (determined from output file extension)
  • --uniprot-tolerance, -ut: Length tolerance when matching UniProt entries (default: 50)

Logging Options

  • --verbose, -v: Increase output verbosity (shows DEBUG messages)
  • --quiet, -q: Suppress all non-error output
  • --log-file, -l: Path to save log file

Help and Version

  • --help, -h: Show help message and exit
  • --version: Show version and exit

Python API

You can also use SNPraefentia programmatically in your Python scripts:

from snpraefentia.core import SNPAnalyst

# Initialize with default settings
analyst = SNPAnalyst()

# Process an input file
results = analyst.run(
    input_file="path/to/input.csv",
    specie="Bacteroides uniformis",
    output_file="path/to/output.csv"  # Optional, omit to skip saving
)

# Or process an existing DataFrame
import pandas as pd
df = pd.read_csv("my_snps.csv")
processed_df = analyst.process_dataframe(df, "Bacteroides uniformis")

# Save results manually if needed
processed_df.to_csv("custom_output.csv", index=False)

Input Format

SNPraefentia accepts both CSV (.csv) and Excel (.xlsx) files as input. The file should contain the following columns:

Column Description Example
Evidence Read depth information A:10 C:5
Effect Variant effect prediction p.Ala123Gly
Gene Gene name geneA
Amino_Acid_Position Amino acid position information 123/500

Additional columns are allowed and will be preserved in the output.

Output Format

SNPraefentia outputs results in the same format as specified by the output file extension (.csv or .xlsx). The following columns are added to the input data:

Column Description
Gene Gene name
UniProt_ID UniProt identifier if found
Total_Protein_Length Total length of the protein
Bacterial_Specie Species name
Taxonomic_ID NCBI taxonomy ID
Normalized_Depth Depth normalized to range [0,1]
Amino_Acid_Impact_Score Score based on physicochemical property changes
Domain_Position_Match Whether mutation is in a protein domain (1=yes, 0=no)
Final_Priority_Score Priority score as percentage (0-100%)

Configuration

UniProt Parameters

  • uniprot_tolerance: When matching genes to UniProt entries, this parameter controls how close the protein length must be to the expected length (default: 50 amino acids)

Logging Options

SNPraefentia provides three verbosity levels:

  • Normal (default): Shows INFO level messages (major processing steps)
  • Verbose (--verbose): Shows DEBUG level messages (detailed information)
  • Quiet (--quiet): Shows only ERROR level messages (problems only)

You can also save logs to a file with --log-file path/to/logfile.log.

Examples

Detailed Logging

snpraefentia --input snps.csv --specie "Klebsiella pneumoniae" --output results.csv --verbose --log-file snpraefentia_run.log

Running Tests

This package uses Python's built-in unittest framework for testing all functions in the package.

To run all tests:

python -m unittest discover -s snpraefentia/tests -v

Troubleshooting

Common Issues

Missing NCBI Taxonomy Database

Error: NCBITaxa not initialized

Solution: Run the following command to download the database:

python -c "from ete3 import NCBITaxa; ncbi = NCBITaxa()"

Species Not Found

Warning: No taxonomy ID found for [species]

Solution: Check the spelling of your species name. Use the scientific name (genus and species).

License

SNPraefentia is licensed under the Apache License 2.0 - see the LICENSE file for details.

Patent Notice

The SNP scoring algorithms and methodologies implemented in this software are protected by patent rights owned by the SNPraefentia Authors. While the source code is available under the Apache License 2.0, usage of the scoring algorithms may require a separate patent license, particularly for commercial applications.

For patent licensing inquiries, please contact the authors.

See the NOTICE file for additional details regarding copyright and patent notices.

Credits

SNPrafentia was developed by Nadeem Khan and Muhammad Muneeb Nasir at Metagenomics Discovery Lab (MDL) at SINES, NUST.

We thank the following professionals for their extensive assistance in the development of this package:

Citation

If you use SNPraefentia in your research, please cite:

Khan, N., & Nasir, M. M. (2025). SNPraefentia: A Comprehensive Tool for SNP Prioritization in Bacterial Genomes. 
GitHub repository: https://github.com/muneebdev7/SNPraefentia
Version 2.0.0

For questions, feature requests, or bug reports, please open an issue on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snpraefentia-2.0.0.tar.gz (26.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snpraefentia-2.0.0-py3-none-any.whl (32.3 kB view details)

Uploaded Python 3

File details

Details for the file snpraefentia-2.0.0.tar.gz.

File metadata

  • Download URL: snpraefentia-2.0.0.tar.gz
  • Upload date:
  • Size: 26.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for snpraefentia-2.0.0.tar.gz
Algorithm Hash digest
SHA256 295463dd3170115d09b5fa27223ad966685a2b416b4fcd921cf35761a6469517
MD5 9347e6d3d727b4015dd9d60169309468
BLAKE2b-256 b5451cc7e040ca5a9c9e8c9501184065fa281249e1e25b30b019e1fbc409a2d3

See more details on using hashes here.

File details

Details for the file snpraefentia-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: snpraefentia-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 32.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for snpraefentia-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4aa1c9aeca24ce306904ce4fe884611e5eedaaf8d2766b812b0196989c74cb76
MD5 dd19710c4c82ac5a2fddda5bb649ffe2
BLAKE2b-256 ab3b9b5054cd721b2d48adebaa4f9ebfba871cac07b386f14a5c4c0d38a779ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page