Testing with PCA projected Concept Activation Vectors

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

yztxwd

These details have not been verified by PyPI

Project description

TPCAV

Testing with PCA projected Concept Activation Vectors

This repository contains code to compute TPCAV (Testing with PCA projected Concept Activation Vectors) on deep learning models. TPCAV is an extension of the original TCAV method, which uses PCA to reduce the dimensionality of the activations at a selected intermediate layer before computing Concept Activation Vectors (CAVs) to improve the consistency of the results.

For more technical details, please check our manuscript on Biorxiv TPCAV: Interpreting deep learning genomics models via concept attribution!

When should I use TPCAV?

TPCAV is a global feature attribution method that can be applied to any model, provided that a set of examples is available to represent the concept of interest. It is input-agnostic, meaning it can operate on raw inputs, engineered features, or tokenized representations, including foundation models.

Typical concepts in Genomics include:

Transcription factor motifs
Cis-regulatory regions
DNA repeats

The same framework naturally extends to other domains, such as protein structure prediction, transcriptomics, or any field with a well established knowledge base, by defining appropriate concept sets.

Installation

pip install tpcav

Detailed Usage

For detailed usage for more flexibility on defining concepts, please refer to this jupyter notebook

Quick start

tpcav only works with Pytorch model, if your model is built using other libraries, you should port the model into Pytorch first. For Tensorflow models, you can use tf2onnx and onnx2pytorch for the conversion.

import torch
from tpcav import run_tpcav

#==================== Prepare Model and Data transform function ================================
class DummyModelSeq(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = torch.nn.Linear(1024, 1)
        self.layer2 = torch.nn.Linear(4, 1)

    def forward(self, seq):
        y_hat = self.layer1(seq)
        y_hat = y_hat.squeeze(-1)
        y_hat = self.layer2(y_hat)
        return y_hat

# By default, every concept extracts fasta sequences and bigwig signals from the given region
# Use your own custom transformation function to get your desired inputs
# Here we transform them into one-hot coded DNA sequences
def transform_fasta_to_one_hot_seq(seq, chrom):
    # `seq` is a list of fasta sequences
    # `chrom` is a numpy array of bigwig signals of shape [-1, # bigwigs, len]
    return (helper.fasta_to_one_hot_sequences(seq),) # it has to return a tuple of inputs, even if there is only one input

#==================== Construct concepts ================================
motif_path = "data/motif-clustering-v2.1beta_consensus_pwms.test.meme" # motif file in meme format for constructing motif concepts
bed_seq_concept = "data/hg38_rmsk.head500k.bed" # a bed file to supply concepts described by a set of regions, format [chrom, start, end, label, concept_name]
genome_fasta = "data/hg38.analysisSet.fa"
model = DummyModelSeq() # load the model
layer_name = "layer1"   # name of the layer to be interpreted, you should be able to retrieve the layer object by getattr(model, layer_name)

# concept_fscores_dataframe: fscores of each concept
# motif_cav_trainers: each trainer contains the cav weights of motifs inserted different number of times
# bed_cav_trainer: trainer contains the cav weights of the sequence concepts provided in bed file
concept_fscores_dataframe, motif_cav_trainers, bed_cav_trainer = run_tpcav(
    model=model,
    layer_name=layer_name,
    motif_file=motif_path,
    motif_file_fmt='meme',  # specify your motif file format, either meme or consensus (tab delimited file in form [motif_name, consensus_sequence])
    genome_fasta=genome_fasta,
    num_motif_insertions=[4, 8],
    bed_seq_file=bed_seq_concept, 
    output_dir="test_run_tpcav_output/",
    input_transform_func=transform_fasta_to_one_hot_seq,
    p=4) # number of concurrent SGDClassifier can be run at the same time, increase it if you have available CPU power, it speeds up training significantly

#==================== Compuate layer attributions of target testing regions ================================
# retrieve the tpcav model
tpcav_model = bed_cav_trainer.tpcav

# create input regions and baseline regions for attribution
random_regions_1 = helper.random_regions_dataframe(genome_fasta + ".fai", 1024, 100, seed=1)
random_regions_2 = helper.random_regions_dataframe(genome_fasta + ".fai", 1024, 100, seed=2)
# create iterators to yield one-hot encoded sequences from the region dataframes
def pack_data_iters(df):
    seq_fasta_iter = helper.dataframe_to_fasta_iter(df, genome_fasta, batch_size=8)
    seq_one_hot_iter = (helper.fasta_to_one_hot_sequences(seq_fasta) for seq_fasta in seq_fasta_iter)
    return zip(seq_one_hot_iter, )
# compute layer attributions given the iterators of testing regions and control regions
attributions = tpcav_model.layer_attributions(pack_data_iters(random_regions_1), pack_data_iters(random_regions_2))["attributions"]
# compute TPCAV scores for the concept
# here uses bed_cav_trainer that contains the concepts provided from bed file
bed_cav_trainer.tpcav_score_all_concepts_log_ratio(attributions)

# save the trainers for future use
torch.save(motif_cav_trainers, "motif_cav_trainers.pt")
torch.save(bed_cav_trainer, "bed_cav_trainer.pt")

Output

The results of TPCAV are stored in CavTrainer object, it contains the F-score of each concept, the corresponding concept activation vector (CAV), and the model object decorated by TPCAV parameters & functions, given the example in Quick Usage:

cav_trainer = motif_cav_trainers[0] # here we take the first motif cav trainer that correponds to the first number of motif insertions
# retrieve F-scores
motif_cav_trainers[0].cav_fscores
# retrieve CAVs
motif_cav_trainers[0].cav_weights

You can also retrieve the model decorated by TPCAV parameters by

tpcav_mode = cav_trainer.tpcav

So that you can compute attributions for new inputs

# compute layer attributions, and compute new tpcav score
attrs = tpcav_model.layer_attributions(target_batches, baseline_batches)
cav_trainer.tpcav_score_all_concepts_log_ratio(attrs)

# input attributions
input_attrs = tpcav_model.input_attributions(target_batches, baseline_batches, multiply_by_inputs=True,)
# or concept specific input attributions (parts explained by the provided concepts CAVs)
input_attrs = tpcav_model.input_attributions(target_batches, baseline_batches, multiply_by_inputs=True, cavs_list=[cav_trainer.cav_weights[concept_name])

If you find any issue, feel free to open an issue (strongly suggested) or contact Jianyu Yang.

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

yztxwd

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.9.1

Apr 9, 2026

0.8.6

Apr 9, 2026

0.8.5

Apr 4, 2026

0.8.4

Apr 2, 2026

0.8.3

Apr 2, 2026

0.8.2

Apr 2, 2026

0.8.1

Apr 2, 2026

0.7.6

Apr 1, 2026

0.7.5

Mar 31, 2026

0.7.4

Mar 31, 2026

0.7.3

Mar 4, 2026

0.7.2

Mar 4, 2026

0.7.1

Feb 27, 2026

This version

0.7.0

Feb 27, 2026

0.6.3

Feb 25, 2026

0.6.2

Feb 25, 2026

0.6.1

Feb 20, 2026

0.6.0

Feb 19, 2026

0.5.0

Feb 19, 2026

0.4.8

Feb 16, 2026

0.4.7

Feb 13, 2026

0.4.6

Feb 13, 2026

0.4.5

Feb 13, 2026

0.4.4

Feb 12, 2026

0.4.3

Feb 12, 2026

0.4.2

Feb 12, 2026

0.4.1

Feb 11, 2026

0.4.0

Feb 10, 2026

0.3.6

Feb 10, 2026

0.3.5

Feb 9, 2026

0.3.4

Feb 9, 2026

0.3.3

Feb 8, 2026

0.3.2

Feb 8, 2026

0.3.1

Feb 8, 2026

0.3.0

Feb 7, 2026

0.2.9

Feb 6, 2026

0.2.8

Feb 6, 2026

0.2.7

Feb 6, 2026

0.2.6

Feb 5, 2026

0.2.5

Feb 5, 2026

0.2.4

Feb 5, 2026

0.2.3

Feb 4, 2026

0.2.2

Feb 4, 2026

0.2.1

Feb 4, 2026

0.2.0

Feb 4, 2026

0.1.0

Jan 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tpcav-0.7.0.tar.gz (42.9 kB view details)

Uploaded Feb 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tpcav-0.7.0-py3-none-any.whl (75.6 kB view details)

Uploaded Feb 27, 2026 Python 3

File details

Details for the file tpcav-0.7.0.tar.gz.

File metadata

Download URL: tpcav-0.7.0.tar.gz
Upload date: Feb 27, 2026
Size: 42.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tpcav-0.7.0.tar.gz
Algorithm	Hash digest
SHA256	`c5b828febff4ce80582f68c1932b4ab49b3997839413fffe64d1ed484d00ee57`
MD5	`4acac588114c44c5c56c79b80ea0c1e5`
BLAKE2b-256	`301579d228a2e83edae74cc6399c42f36cd2cece6671d9f09ed2fab58ac26ca6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tpcav-0.7.0.tar.gz:

Publisher: publish-pypi.yml on seqcode/TPCAV

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tpcav-0.7.0.tar.gz
- Subject digest: c5b828febff4ce80582f68c1932b4ab49b3997839413fffe64d1ed484d00ee57
- Sigstore transparency entry: 1001406834
- Sigstore integration time: Feb 27, 2026
Source repository:
- Permalink: seqcode/TPCAV@0c333ad740251cede67a4d737f0750c05be15178
- Branch / Tag: refs/tags/0.7.0
- Owner: https://github.com/seqcode
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@0c333ad740251cede67a4d737f0750c05be15178
- Trigger Event: release

File details

Details for the file tpcav-0.7.0-py3-none-any.whl.

File metadata

Download URL: tpcav-0.7.0-py3-none-any.whl
Upload date: Feb 27, 2026
Size: 75.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tpcav-0.7.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d784ffb8c8298f8cb09653d3c29b61247ce6a305d562186c46a1ec87b3a9d0b9`
MD5	`b6444f83eecee92214e522e49e620cb3`
BLAKE2b-256	`98f2b84a8766fc05dbcd5a53df362fea0494452777f283e7ccb68b7f268e9ca7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tpcav-0.7.0-py3-none-any.whl:

Publisher: publish-pypi.yml on seqcode/TPCAV

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tpcav-0.7.0-py3-none-any.whl
- Subject digest: d784ffb8c8298f8cb09653d3c29b61247ce6a305d562186c46a1ec87b3a9d0b9
- Sigstore transparency entry: 1001406867
- Sigstore integration time: Feb 27, 2026
Source repository:
- Permalink: seqcode/TPCAV@0c333ad740251cede67a4d737f0750c05be15178
- Branch / Tag: refs/tags/0.7.0
- Owner: https://github.com/seqcode
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@0c333ad740251cede67a4d737f0750c05be15178
- Trigger Event: release

tpcav 0.7.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

TPCAV

When should I use TPCAV?

Installation

Detailed Usage

Quick start

Output

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance