CLIPNET is an ensembled convolutional neural network for predicting transcription initiation from DNA sequence at single nucleotide resolution.

These details have not been verified by PyPI

Project links

homepage

Project description

CLIPNET

CLIPNET (Convolutionally Learned, Initiation-Predicting NETwork) is an ensembled convolutional neural network that predicts transcription initiation from DNA sequence at single nucleotide resolution. We describe CLIPNET in our preprint on bioRxiv. This repository contains code for working with CLIPNET, namely for generating predictions and feature interpretations and performing in silico mutagenesis scans. To reproduce the figures in our paper, please see the clipnet_paper GitHub repo.

PyTorch reimplementation and port

Code to port the TensorFlow models to PyTorch is available as part of PersonalBPNet, which also includes a from-scratch reimplementation of CLIPNET in PyTorch with a context length of 2114 bp. The below instructions are for working with CLIPNET in TensorFlow.

CODE REFACTORING NOTICE

I have significantly altered the structure of this code base since its original release with the preprint. The new CLIPNET package should be significantly easier to use (pip installable, with clearer CLI and API). To access the code as it was prior to this refactoring, please check out the (unmaintained) deprecated branch.

Installation

To install CLIPNET, we recommend creating an isolated environment. For example, with conda/mamba:

mamba create -n clipnet -c conda-forge python~=3.9
mamba activate clipnet

Then install with pip:

# From PyPI
pip install clipnet
# Or from source:
pip install git+https://github.com/Danko-Lab/clipnet.git

You may need to configure your CUDA/cudatoolkit/cudnn paths to get GPU support working. See the tensorflow documentation for more information.

Download models

Pretrained CLIPNET models are available on Zenodo.

mkdir -p clipnet_models/
for fold in {1..9};
do wget https://zenodo.org/records/10408623/files/fold_${fold}.h5 -P clipnet_models/;
done

Alternatively, they can be accessed via HuggingFace.

Usage

Input data

CLIPNET was trained on a population-scale PRO-cap dataset derived from human lymphoblastoid cell lines, matched with individualized genome sequences (1kGP). CLIPNET accepts 1000 bp sequences as input and imputes PRO-cap coverage (RPM) in the center 500 bp.

CLIPNET can either work on haploid reference sequences (e.g. hg38) or on individualized sequences (e.g. 1kGP). When constructing individualized sequences, we made two major simplifications: (1) We considered only SNPs and (2) we used unphased SNP genotypes.

We encode sequences using a "two-hot" encoding. That is, we encoded each individual nucleotide at a given position using a one-hot encoding scheme, then represented the unphased diploid sequence as the sum of the two one-hot encoded nucleotides at each position. The sequence "AYCR" (= A(C/T)C(A/G)), for example, would be encoded as: [[2, 0, 0, 0], [0, 1, 0, 1], [0, 2, 0, 0], [1, 0, 1, 0]].

Colab examples

Google Colab examples for analyzing and applying CLIPNET are available through the following links:

Basic tutorial illustrating how to generate predictions and attributions with the models.
Variant effect prediction and interpretation (TODO)
MPRA optimization/design (TODO)

Command line interface

CLIPNET can be accessed via a CLI:

clipnet -h

Predictions

The predict command can be used to generate predictions:

clipnet predict -f data/test.fa -o data/test_predictions.npz -m clipnet_models/ -v

The -m flag should be used to specify either a path to the directory containing the CLIPNET models (in which case the averaged predictions across all model replicates will be returned) or a specific model path (in which case only the predictions of that model will be returned).

To input individualized sequences, heterozygous positions should be represented using the IUPAC ambiguity codes R (A/G), Y (C/T), S (C/G), W (A/T), K (G/T), M (A/C).

The output npz file will contain two arrays. The first output ("arr_0", "profile") is a length 1000 vector (500 plus strand concatenated with 500 minus strand) representing the predicted base-resolution profile/shape of initiation. The second output ("arr_0", "quantity") represents the total PRO-cap quantity on both strands.

To generate actual predicted tracks, the profile prediction should be rescaled by the quantity prediction. For example:

import numpy as np

f = np.load("data/test_predictions.npz") 
profile = f["arr_0"]
quantity = f["arr_1"]
profile_scaled = (profile / np.sum(profile, axis=1)[:, None]) * quantity

Attributions

CLIPNET uses DeepSHAP to generate attributions. To generate DeepSHAP scores, use the attribute command. This script takes a fasta file containing 1000 bp records and outputs DeepSHAP attributions and optionally one-hot encoded sequences. Please note that both attribution and ohe are saved as length last for compatibility with tfmodisco-lite.

Two different attribution modes that can be set with -a/--attribution_type: profile and quantity. The profile mode calculates interpretations for the profile node of the model (using the profile metric proposed in BPNet), while the quantity mode calculates interpretations for the quantity node of the model.

clipnet attribute \
    -f data/test.fa.gz \
    -o data/test_quantity_shap.npz \
    -m clipnet_models/ \
    -a quantity \
    -v

# -c maybe needed to avoid precision errors.

Note that while CLIPNET accepts two-hot encoded sequences to accomodate heterozygous positions, attributions are much more interpretable when using a haploid/fully homozygous genome, so we recommend avoiding heterozygous positions for attributions. Also note that these are actual contribution scores, as opposed to hypothetical contribution scores. Specifically, non-reference nucleotides are set to zero. To return attribution scores for all nucleotides, use the -y flag.

Discovering epistatic motifs

clipnet supports epistasis analyses using Deep Feature Interaction Maps (DFIM). Please note that this is a custom reimplementation of DFIM using DeepSHAP as the attribution backend, as the original DFIM package is unmaintained and difficult to install. DFIM scores can be calculated for a given fasta file using the epistasis command:

clipnet epistasis \
    -f data/test.fa \
    -o data/test_dfim_profile.npz \
    -m clipnet_models/ \
    -s 250 -e 750 \
    -a profile \
    -v

Please note DFIM scores don't properly account for things like global epistasis/nonlinearity, which can cause misleading interpretations. For a more robust (but time-consuming) method for estimating interaction effects, see SQUID.

Genomic in silico mutagenesis scans

To generate genomic in silico mutagenesis scans, use the ism_shuffle script. This script takes a fasta file containing 1000 bp records and outputs an npz file containing the ISM shuffle results (corr_ism_shuffle and logfc_ism_shuffle) for each record. For example:

clipnet ism_shuffle -f data/test.fa -o data/test_ism.npz -m clipnet_models/ -v

API usage

CLIPNET models can be directly loaded as follows. Individual models can simply be loaded using tensorflow:

import tensorflow as tf

nn = tf.keras.models.load_model("clipnet_models/fold_1.h5", compile=False)

The model ensemble is constructed by averaging track and quantity outputs across all 9 model folds. To make this easy, we've provided a simple API in the clipnet.clipnet.CLIPNET class for doing this. Moreover, to make reading fasta files into the correct format easier, we've provided the helper function clipnet.utils.get_twohot_fasta_sequences. For example:

import sys
from clipnet.clipnet import CLIPNET
from clipnet.utils import get_twohot_fasta_sequences

nn = CLIPNET()
ensemble = nn.construct_ensemble("clipnet_models/")
seqs = get_twohot_fasta_sequences("data/test.fa")

predictions = ensemble.predict(seqs)

Project details

These details have not been verified by PyPI

Project links

homepage

Release history Release notifications | RSS feed

This version

0.2.2

May 20, 2026

0.2.1

May 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clipnet-0.2.2.tar.gz (36.8 kB view details)

Uploaded May 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

clipnet-0.2.2-py3-none-any.whl (43.2 kB view details)

Uploaded May 20, 2026 Python 3

File details

Details for the file clipnet-0.2.2.tar.gz.

File metadata

Download URL: clipnet-0.2.2.tar.gz
Upload date: May 20, 2026
Size: 36.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for clipnet-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`fa0a97fd2111789f6f6250603f097d60e3ab27926c811777857b653bde41f316`
MD5	`07ae494d48b89b5b029449fdf30374b1`
BLAKE2b-256	`ddd79f03bc882c3a23c6c1cafbf1831292a278dccbffa6b402bc1c56b67df6d7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for clipnet-0.2.2.tar.gz:

Publisher: publish.yml on Danko-Lab/clipnet

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: clipnet-0.2.2.tar.gz
- Subject digest: fa0a97fd2111789f6f6250603f097d60e3ab27926c811777857b653bde41f316
- Sigstore transparency entry: 1587426002
- Sigstore integration time: May 20, 2026
Source repository:
- Permalink: Danko-Lab/clipnet@53e11f5619a5a8fc9f13bbdfd98496b7b00337db
- Branch / Tag: refs/tags/v0.2.2
- Owner: https://github.com/Danko-Lab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@53e11f5619a5a8fc9f13bbdfd98496b7b00337db
- Trigger Event: release

File details

Details for the file clipnet-0.2.2-py3-none-any.whl.

File metadata

Download URL: clipnet-0.2.2-py3-none-any.whl
Upload date: May 20, 2026
Size: 43.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for clipnet-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cc83e1186b631b7b1fd915b26954390c1f34f8573cffad068f4c9f1dfd09cace`
MD5	`b59322f5b8d4fa400718c94caa1db588`
BLAKE2b-256	`d0caae5a63e8f195eab2721073b5624ab891d23f0d65ba3ce34face20ef50791`

See more details on using hashes here.

Provenance

The following attestation bundles were made for clipnet-0.2.2-py3-none-any.whl:

Publisher: publish.yml on Danko-Lab/clipnet

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: clipnet-0.2.2-py3-none-any.whl
- Subject digest: cc83e1186b631b7b1fd915b26954390c1f34f8573cffad068f4c9f1dfd09cace
- Sigstore transparency entry: 1587426258
- Sigstore integration time: May 20, 2026
Source repository:
- Permalink: Danko-Lab/clipnet@53e11f5619a5a8fc9f13bbdfd98496b7b00337db
- Branch / Tag: refs/tags/v0.2.2
- Owner: https://github.com/Danko-Lab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@53e11f5619a5a8fc9f13bbdfd98496b7b00337db
- Trigger Event: release

clipnet 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

CLIPNET

PyTorch reimplementation and port

CODE REFACTORING NOTICE

Installation

Download models

Usage

Input data

Colab examples

Command line interface

Predictions

Attributions

Discovering epistatic motifs

Genomic in silico mutagenesis scans

API usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance