Skip to main content

D-Sites: Hybrid TFBS predictor (PWM + DNA shape + RF)

Project description

D-Sites: Hybrid TFBS Predictor for Bacterial Genomes

Python 3.8+ License: MIT

A comprehensive computational tool for predicting transcription factor binding sites (TFBS) in bacterial genomes using hybrid PWM, DNA shape features, and Random Forest classification.

🚀 Quick Start

Installation

## Quick Start
git clone https://github.com/yourusername/dsites.git
cd dsites
pip install -r requirements.txt

Basic Prediction

python src/D-Sites.py --fasta examples/AmrZ/genome.fasta \
                     --gff examples/AmrZ/annotation.gff \
                     --motif examples/AmrZ/motif.meme \
                     --gene AmrZ \
                     --genome_accession NC_002516.2

Run Benchmarking

# Comprehensive benchmarking
python scripts/fullbench.py

# FNR-specific analysis
python scripts/fimo_fnr.py

# Generate validation plots
python scripts/generate_pr_curves.py

📊 Available Scripts

  • src/D-Sites.py: Main prediction pipeline
  • scripts/fullbench.py: Comprehensive performance evaluation
  • scripts/comprehensive_validation.py: Validation across all TFs
  • scripts/fimo_fnr.py: FNR-specific FIMO comparison
  • scripts/generate_pr_curves.py: Precision-Recall curve generation
  • scripts/generate_enrichment_plot.py: Promoter enrichment analysis
  • scripts/master_analysis.py: Master analysis script

🧪 Validation Datasets

Complete validation data for four transcription factors:

  • AmrZ: Pseudomonas aeruginosa PAO1
  • GlxR: Corynebacterium glutamicum R
  • CodY: Bacillus anthracis Sterne
  • FNR: Salmonella enterica Typhimurium

📈 Performance

D-Sites demonstrates:

  • Up to 9.3× higher recall than FIMO
  • 3-4× higher precision in top predictions
  • 3.02-3.42× enrichment in promoter regions
  • 68.1% validation success for FNR regulon

📝 Citation

If you use D-Sites in your research, please cite:

Pankaj et al. (2025). D-Sites: A computationally efficient tool for predicting protein binding sites in bacterial genomes. Journal Name, Volume, Pages.

📄 License

MIT License - see LICENSE for details.

💬 Contact

For questions and support, please open an issue on GitHub or contact ft.pank@gmail.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dsites-1.1.0.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dsites-1.1.0-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file dsites-1.1.0.tar.gz.

File metadata

  • Download URL: dsites-1.1.0.tar.gz
  • Upload date:
  • Size: 15.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for dsites-1.1.0.tar.gz
Algorithm Hash digest
SHA256 ee60edb39db9f8557f825852f9b709d552ce8f40e80e0294fffe67ba46697600
MD5 063d54485e313a0002dd3e9b7eada206
BLAKE2b-256 32a9e33829d02a0bbd0bb949fe4f4a96380fc3537435a8a1d77bba3a5ac9526d

See more details on using hashes here.

File details

Details for the file dsites-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: dsites-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for dsites-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 67d7a4797bb200f8ef3f5cbe3c0353555728b50f604bcf7c55da58cc75f183f9
MD5 75e7dbfd877eb2447a9c547f59a24c1a
BLAKE2b-256 60ee7f20a2110eb9c0fea2dadc461ccd21002e492f04c92d1a998bad79abb591

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page