Skip to main content

Testing with PCA projected Concept Activation Vectors

Project description

TPCAV

Testing with PCA projected Concept Activation Vectors

This repository contains code to compute TPCAV (Testing with PCA projected Concept Activation Vectors) on deep learning models. TPCAV is an extension of the original TCAV method, which uses PCA to reduce the dimensionality of the activations at a selected intermediate layer before computing Concept Activation Vectors (CAVs) to improve the consistency of the results.

For more technical details, please check our manuscript on Biorxiv TPCAV: Interpreting deep learning genomics models via concept attribution!

Installation

pip install tpcav

Quick start

tpcav only works with Pytorch model, if your model is built using other libraries, you should port the model into Pytorch first. For Tensorflow models, you can use tf2onnx and onnx2pytorch for the conversion.

import torch
from tpcav import run_tpcav

#==================== Prepare Model and Data transform function ================================
class DummyModelSeq(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = torch.nn.Linear(1024, 1)
        self.layer2 = torch.nn.Linear(4, 1)

    def forward(self, seq):
        y_hat = self.layer1(seq)
        y_hat = y_hat.squeeze(-1)
        y_hat = self.layer2(y_hat)
        return y_hat

# By default, every concept extracts fasta sequences and bigwig signals from the given region
# Use your own custom transformation function to get your desired inputs
# Here we transform them into one-hot coded DNA sequences
def transform_fasta_to_one_hot_seq(seq, chrom):
    # `seq` is a list of fasta sequences
    # `chrom` is a numpy array of bigwig signals of shape [-1, # bigwigs, len]
    return (helper.fasta_to_one_hot_sequences(seq),) # it has to return a tuple of inputs, even if there is only one input

#==================== Construct concepts ================================
motif_path = "data/motif-clustering-v2.1beta_consensus_pwms.test.meme" # motif file in meme format for constructing motif concepts
bed_seq_concept = "data/hg38_rmsk.head500k.bed" # a bed file to supply concepts described by a set of regions, format [chrom, start, end, label, concept_name]
genome_fasta = "data/hg38.analysisSet.fa"
model = DummyModelSeq() # load the model
layer_name = "layer1"   # name of the layer to be interpreted, you should be able to retrieve the layer object by getattr(model, layer_name)

# concept_fscores_dataframe: fscores of each concept
# motif_cav_trainers: each trainer contains the cav weights of motifs inserted different number of times
# bed_cav_trainer: trainer contains the cav weights of the sequence concepts provided in bed file
concept_fscores_dataframe, motif_cav_trainers, bed_cav_trainer = run_tpcav(
    model=model,
    layer_name=layer_name,
    meme_motif_file=motif_path,
    genome_fasta=genome_fasta,
    num_motif_insertions=[4, 8],
    bed_seq_file=bed_seq_concept, 
    output_dir="test_run_tpcav_output/",
    input_transform_func=transform_fasta_to_one_hot_seq)

#==================== Compuate layer attributions of target testing regions ================================
# retrieve the tpcav model
tpcav_model = bed_cav_trainer.tpcav

# create input regions and baseline regions for attribution
random_regions_1 = helper.random_regions_dataframe(genome_fasta + ".fai", 1024, 100, seed=1)
random_regions_2 = helper.random_regions_dataframe(genome_fasta + ".fai", 1024, 100, seed=2)
# create iterators to yield one-hot encoded sequences from the region dataframes
def pack_data_iters(df):
    seq_fasta_iter = helper.dataframe_to_fasta_iter(df, genome_fasta, batch_size=8)
    seq_one_hot_iter = (helper.fasta_to_one_hot_sequences(seq_fasta) for seq_fasta in seq_fasta_iter)
    return zip(seq_one_hot_iter, )
# compute layer attributions given the iterators of testing regions and control regions
attributions = tpcav_model.layer_attributions(pack_data_iters(random_regions_1), pack_data_iters(random_regions_2))["attributions"]
# compute TPCAV scores for the concept
# here uses bed_cav_trainer that contains the concepts provided from bed file
bed_cav_trainer.tpcav_score_all_concepts_log_ratio(attributions)

Detailed Usage

For detailed usage, please refer to this jupyter notebook

If you find any issue, feel free to open an issue (strongly suggested) or contact Jianyu Yang.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tpcav-0.2.7.tar.gz (27.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tpcav-0.2.7-py3-none-any.whl (25.6 kB view details)

Uploaded Python 3

File details

Details for the file tpcav-0.2.7.tar.gz.

File metadata

  • Download URL: tpcav-0.2.7.tar.gz
  • Upload date:
  • Size: 27.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for tpcav-0.2.7.tar.gz
Algorithm Hash digest
SHA256 2eda03f2398b873a26a9c26f2d29db190a2c79b3b5beca461774e06c1cb7421f
MD5 b2d828cd93e63b58c9a5b483ce56fcf2
BLAKE2b-256 6e10f40fb69013a415c2039eebd6d8c929f303fd1144d7214ee13fa8188f5770

See more details on using hashes here.

File details

Details for the file tpcav-0.2.7-py3-none-any.whl.

File metadata

  • Download URL: tpcav-0.2.7-py3-none-any.whl
  • Upload date:
  • Size: 25.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for tpcav-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 4ba375acb38d28bbb1e8408f1a0bedf118df60a1f0a6a91e55c436ed13b93860
MD5 fa2f6bf5b276e5fddff57c68f6bc9e47
BLAKE2b-256 6403392c45e22418c606416b46c6afb1b9ebbe97cd9a61647f0d7a5b047822e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page