Skip to main content

D-Sites: Hybrid TFBS predictor (PWM + DNA shape + RF)

Project description

D-Sites: Hybrid TFBS Predictor for Bacterial Genomes

Python 3.8+ License: MIT

A comprehensive computational tool for predicting transcription factor binding sites (TFBS) in bacterial genomes using hybrid PWM, DNA shape features, and Random Forest classification.

🚀 Quick Start

Installation

## Quick Start
git clone https://github.com/pankaj357/D-Sites.git
cd dsites
pip install -r requirements.txt

Basic Prediction

Minimal Command

python src/D-Sites.py --fasta <genome.fasta> \
                     --gff <annotation.gff> \
                     --motif <motif_file> \
                     --gene <TF_name> \
                     --genome_accession <accession_id>

Complete Example

python src/D-Sites.py \
    --fasta <path_to_genome.fasta> \
    --gff <path_to_annotation.gff> \
    --motif <path_to_motif_file> \
    --gene <TF_NAME> \
    --genome_accession <GENOME_ACCESSION> \
    --outdir results \
    --n_trees 300 \
    --neg_ratio 5 \
    --prob_cutoff 0.5 \
    --pad 10 \
    --seed 42 \
    --batch 10000 \
    --up 300 \
    --down 50 \
    --auto_cutoff

Command Breakdown

Required Arguments

--fasta: Genome FASTA file path

--gff: Genome annotation file (GFF3 format)

--motif: TF motif file (JASPAR or MEME format)

--gene: Transcription factor name

--genome_accession: Genome accession ID

Optional Arguments with Defaults

--outdir results: Output directory

--n_trees 300: Number of Random Forest trees

--neg_ratio 5: Negative:Positive ratio

--prob_cutoff 0.5: Probability cutoff

--pad 10: Window padding around known sites

--seed 42: Random seed

--batch 10000: Batch size for processing

--up 300: Upstream promoter size

--down 50: Downstream promoter size

📈 Performance

D-Sites demonstrates:

  • 3-4× higher precision in top predictions
  • 3.02-3.42× enrichment in promoter regions

📝 Citation

If you use D-Sites in your research, please cite:

Pankaj et al. (2025). D-Sites: A hybrid machine-learning framework for prediction of transcription factor binding sites in bacterial genomes. Information Sciences (Under Review),2024.

📄 License

MIT License - see LICENSE for details.

💬 Contact

For questions and support, contact:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dsites-1.1.1.tar.gz (132.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dsites-1.1.1-py3-none-any.whl (131.9 kB view details)

Uploaded Python 3

File details

Details for the file dsites-1.1.1.tar.gz.

File metadata

  • Download URL: dsites-1.1.1.tar.gz
  • Upload date:
  • Size: 132.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for dsites-1.1.1.tar.gz
Algorithm Hash digest
SHA256 87cdab7ce2acc4ff7fc21e819be90e2df77347ae0d2b687cb5603e6e18add361
MD5 21cfd48420da463777c33e4fc949183a
BLAKE2b-256 98dca1f236d068dc1db969feeaa680914fb9bcf49b9458938c3a24db3f6450bc

See more details on using hashes here.

File details

Details for the file dsites-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: dsites-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 131.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for dsites-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 088fee63a5ec57dd9f2dc23bfcd5a7c01348edac524ad523e6ed1b62a50f8123
MD5 66d5e4f0b2cbc435f3c9e86c43872e5f
BLAKE2b-256 4199a83cb0701267273aa057473e0f4c051a0046d37581d1e67970e7b627b126

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page