Skip to main content

Testing with PCA projected Concept Activation Vectors

Project description

TPCAV (Testing with PCA projected Concept Activation Vectors)

Analysis pipeline for TPCAV

Dependencies

You can use your own environment for the model, in addition, you need to install the following packages:

  • captum 0.7
  • seqchromloader 0.8.5
  • scikit-learn 1.5.2

Workflow

  1. Since not every saved pytorch model stores the computation graph, you need to manually add functions to let the script know how to get the activations of the intermediate layer and how to proceed from there.

    There are 3 places you need to insert your own code.

    • Model class definition in models.py

      • Please first copy your class definition into Model_Class in the script, it already has several pre-defined class functions, you need to fill in the following two functions:
        • forward_until_select_layer: this is the function that takes your model input and forward until the layer you want to compute TPCAV score on
        • resume_forward_from_select_layer: this is the function that starts from the activations of your select layer and forward all the way until the end
      • There are also functions necessary for TPCAV computation, don't change them:
        • forward_from_start: this function calls forward_until_select_layer and resume_forward_from_select_layer to do a full forward pass
        • forward_from_projected_and_residual: this function takes the PCA projected activations and unexplained residual to do the forward pass
        • project_avs_to_pca: this function takes care of the PCA projection

      NOTE: you can modify your final output tensor to specifically explain certain transformation of your output, for example, you can take weighted sum of base pair resolution signal prediction to emphasize high signal region.

    • Function load_model in utils.py

      • Take care of the model initialization and load saved parameters in load_model, return the model instance.

      NOTE: you need to use your own model class definition in models.py, as we need the functions defined in step 1.

    • Function seq_transform_fn in utils.py

      • By default the dataloader provides one hot coded DNA array of shape (batch_size, 4, len), coded in the order [A, C, G, T], if your model takes a different kind of input, modify seq_transform_fn to transform the input
    • Function chrom_transform_fn in utils.py

      • By default the dataloader provides signal array from bigwig files of shape (batch_size, # bigwigs, len), if your model takes a different kind of chromatin input, modify chrom_transform_fn to transform the input, if your model is sequence only, leave it to return None.
  2. Compute CAVs on your model, example command:

srun -n1 -c8 --gres=gpu:1 --mem=128G python scripts/run_tcav_sgd_pca.py \
  cavs_test 1024 data/hg19.fa data/hg19.fa.fai \
  --meme-motifs data/motif-clustering-v2.1beta_consensus_pwms.test.meme \
  --bed-chrom-concepts data/ENCODE_DNase_peaks.bed
  1. Then compute the layer attributions, example command:
srun -n1 -c8 --gres=gpu:1 --mem=128G \
  python scripts/compute_layer_attrs_only.py cavs_test/tpcav_model.pt \
  data/ChIPseq.H1-hESC.MAX.conservative.all.shuf1k.narrowPeak \
  1024 data/hg19.fa data/hg19.fa.fai cavs_test/test 
  1. run the jupyer notebook to generate summary of your results
papermill -f scripts/compute_tcav_v2_pwm.example.yaml scripts/compute_tcav_v2_pwm.py.ipynb cavs_test/tcav_report.py.ipynb

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tpcav-0.1.0.tar.gz (22.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tpcav-0.1.0-py3-none-any.whl (21.0 kB view details)

Uploaded Python 3

File details

Details for the file tpcav-0.1.0.tar.gz.

File metadata

  • Download URL: tpcav-0.1.0.tar.gz
  • Upload date:
  • Size: 22.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for tpcav-0.1.0.tar.gz
Algorithm Hash digest
SHA256 99ed835658f894427bed0709cb33dcc7d079ba5ddc0a6561991d5bb5afb12e23
MD5 093bc844f81c94695a6dc22b4d238873
BLAKE2b-256 2b8ab180780e5216b3d237ee4c5dd776829d790a5bf7656aadc6813dc016dcb2

See more details on using hashes here.

File details

Details for the file tpcav-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tpcav-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for tpcav-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2d931da6635d7eb69414ebece67c77a926d3713bb9bd444b1ab4638f20989185
MD5 c4f5b6d1ce7176baee79b62a5acba8dc
BLAKE2b-256 df51d9aa54b31d75bebd4aea2dc1b938b39244239ba39fcc21bde2c2dd0b6feb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page