Skip to main content

BoltzNet

Project description

boltznet

BoltzNet is a biophysically designed neural network that learns a quantitative model of TF-DNA binding energy from ChIP-Seq data. BoltzNet mirrors a quantitative biophysical model and provides directly interpretable predictions genome-wide at nucleotide resolution. We have performed ChIP-Seq mapping of genome-wide DNA binding for 139 E. coli TFs. From these data we have generated BoltzNet models for 124 TFs.

The Boltznet models are described in our publication and through the companion website:

https://boltznet.bu.edu

This python package provides a high-level object interface for downloading pretrained models, running predictions on DNA sequences, and visualizing results.

Installation

Create a conda environment and activate it. The run:

pip install boltznet
boltznet-init

This installs the package and downloads available models to the package cache dir. To perform selftests, run

boltznet-selftest

This builds a model on all TFs, performs predictions on a set of ecoli promoter sequences, and then generates and saves a plot for as selftest_pdhR,pdhR-aceE-aceF-lpd.png. Basically runs the example code in Usage below.

USAGE

from boltznet import boltznet_tf

####################################
# create a tfmodel on all TFs that have been loaded into the package cache
####################################
tfmodel=boltznet_tf.create()

####################################################
# load sequences from fasta file and run predictions
# Returns a np.array of predicions at each position on both
# strands of each sequence for all TFs
#
# The numpy array has shape:
# (nseqs,2,seqlen,numtfs)
# - nseqs: number of sequences
# - 2: forward and reverse strands
# - seqlen: length of each sequence
# - numtf: number of models
####################################################
fa_name='teset.fa'
y=tfmodel(fastafile=fa_name)

####################################################
# load annotations for the sequences for plotting
####################################################
gff_name='ecoli.gff'
tfmodel.loadGff(gff_name)

####################################################
# Plot the predictions for sequences by sequence index or sequence name patterns
# Below will plot sequence number 76 as well as any sequences that 
# contain chaC or pdhR in the name.  But will not plot the same sequence twice
# If savefilename is None, generate plots in a window
3 If savefilename is given, generate plots named savefilename_<seqid>.png
####################################################
tfmodel.plotPrediction(inds=[76],seqnames=['chaC','pdhR'],model_names=None,seqlogo=False,baseseq=False, maxN=3, savefilename='test')

Test data

The package comes bundled with two datafiles that can be used for testing:

You can retrieve and use these data files with code like the following:

from importlib import resources
import boltznet.testdata as testdata_pkg

fa_name=resources.files(testdata_pkg).joinpath('promoters.fa')

gff_name=resources.files(testdata_pkg).joinpath('ecoli.gff')

Citation

The code for BoltzNet is freely available for academic use. BoltzNet can be used by molecular biologists seeking to quantitatively predict TF binding, by synthetic biologists seeking to predictively engineer new regulatory interactions, and by computational biologists seeking to develop biophysically motivated bioinformatic tools.

  • Lally, Patrick, Gómez-Romero, Laura, Tierrafría, Víctor H., Aquino, Patricia, Rioualen, Claire, Zhang, Xiaoman, Kim, Sunyoung, Baniulyte, Gabriele, Plitnick, Jonathan, Smith, Carol, Babu, Mohan, Collado-Vides, Julio, Wade, Joseph, Galagan, James E. (2025) Predictive Biophysical Neural Network Modeling of a Compendium of in vivo Transcription Factor DNA Binding Profiles for Escherichia coli. Nature Communications

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

boltznet-0.3.0.tar.gz (220.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

boltznet-0.3.0-py3-none-any.whl (227.5 kB view details)

Uploaded Python 3

File details

Details for the file boltznet-0.3.0.tar.gz.

File metadata

  • Download URL: boltznet-0.3.0.tar.gz
  • Upload date:
  • Size: 220.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for boltznet-0.3.0.tar.gz
Algorithm Hash digest
SHA256 3e4df7095bede4cabc0e846dfcb46328fab147852a0ce3e2a1db36b8f50bf713
MD5 131e0aef8be9eb4f4730a92b808e329b
BLAKE2b-256 62fdbaec0767ba612436aa4a90f75b82d7303e86008115d310f839c96382753f

See more details on using hashes here.

File details

Details for the file boltznet-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: boltznet-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 227.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for boltznet-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a0c566171af042b566796c5cae281bd26695b78156fd3f217eafcadabba0bc95
MD5 ccb6ddab0dd249880e7ddf7cd3a2c9e9
BLAKE2b-256 9c1beb93bcd69a958e83be350a877bd44fa4dc122552d40224d45dae237160a0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page