Skip to main content

SEAM: Meta-explanations for interpreting sequence-based deep learning models

Project description

SEAM: systematic explanation of attribution-based mechanisms for regulatory genomics

PyPI version Downloads Documentation Status

This repository contains the Python implementation of SEAM (Systematic Explanation of Attribution-based Mechanisms), an AI interpretation framework that systematically investigates how mutations reshape regulatory mechanisms. For an extended discussion of this approach and its applications, please refer to our manuscript, which we presented at the ICLR 2025 GEM Workshop:

  • Seitz, E.E., McCandlish, D.M., Kinney, J.B., and Koo P.K. Decoding the Mechanistic Impact of Genetic Variation on Regulatory Sequences with Deep Learning. Workshop on Generative and Experimental Perspectives for Biomolecular Design, International Conference on Learning Representations, April 15, 2025. https://openreview.net/forum?id=PtjMeyHcTt

A bioRxiv preprint is also currently underway.


Installation:

With Anaconda sourced, create a new environment via the command line:

conda create --name seam

Next, activate this environment via conda activate seam, and install the following packages:

pip install seam-nn

Finally, when you are done using the environment, always exit via conda deactivate.

Notes

SEAM has been tested on Mac and Linux operating systems. Typical installation time on a normal computer is less than 1 minute.

If you have any issues installing SEAM, please see:

For issues installing SQUID, the package used for sequence generation and inference, please see:

Older DNNs that require inference via Tensorflow 1.x or related packages may be in conflict with SEAM defaults. Users will need to run SEAM piecewise within separate environments:

  1. Tensorflow 1.x environment for generating in silico sequence-function-mechanism dataset
  2. Tensorflow 2.x environment for applying SEAM to explain in silico sequence-function-mechanism dataset

Usage and Requirements:

SEAM provides a unified interface for mechanistic interpretation of sequence-based deep learning models.

fig

The framework takes as input a sequence-based oracle (e.g., a genomic DNN) and requires four key components to perform analysis:

  1. Sequence Library (numpy.ndarray): One-hot encoded sequences of shape (N, L, A), where:

    • N: Number of sequences
    • L: Sequence length
    • A: Number of features (e.g., 4 for DNA nucleotides)
  2. Predictions/Measurements (numpy.ndarray): Experimental or model-derived values of shape (N,1), corresponding to each sequence's functional output.

  3. Attribution Maps (numpy.ndarray): Mechanistic importance scores of shape (N, L, A), quantifying the contribution of each position-feature pair to the sequence's function. These can be generated using various attribution methods:

  4. Clustering/Embedding (either):

    • Hierarchical clustering linkage matrix (e.g., from scipy.cluster.hierarchy.linkage)
    • Dimensionality reduction embedding of shape (N,Z), where Z is the number of dimensions in the embedded space

These required files can be generated either externally or using SEAM's specialized modules (described below). Once provided, SEAM applies a meta-explanation approach to interpret the sequence-function-mechanism dataset, deciphering the determinants of mechanistic variation in regulatory sequences.

For detailed examples of how to generate these requirements using SEAM's modules and apply the analysis pipeline to reproduce key findings from our main manuscript, see the Examples section at the end of this document.

SEAM Modules:

SEAM's analysis pipeline is implemented through several specialized modules that work together:

  • Mutagenizer (from SQUID): Generates in silico sequence libraries through various mutagenesis strategies, including local, global, optimized, and complete libraries (supporting all combinatorial mutations up to a specified order). Features GPU-acceleration and batch processing for efficient sequence generation.

  • Compiler: Standardizes sequence analysis by converting one-hot encoded sequences to string format and computing associated metrics. Compiles sequences and functional properties into a DataFrame, with support for metrics such as Hamming distances and global importance analysis scores. Implements GPU-accelerated sequence conversion and vectorized operations.

  • Attributer: Computes attribution maps that quantify the base-wise contribution to regulatory activity. SEAM provides GPU-accelerated implementations of Saliency Maps, IntGrad, SmoothGrad, and ISM. DeepSHAP is not yet optimized for efficient batch processing across the sequence library. Examples for incorportating DeepSHAP using external scripts are provided in the examples folder.

  • Clusterer: Computes mechanistic clusters and embeddings from attribution maps to identify distinct regulatory mechanisms. Supports hierarchical clustering (GPU-optimized), K-means, and DBSCAN algorithms, with optional dimensionality reduction (UMAP, t-SNE, PCA) for complementary interpretability.

  • MetaExplainer: The core SEAM module that integrates results to identify and interpret mechanistic patterns. Generates cluster-averaged attribution maps (shape: (L, A) for each cluster) and the Mechanism Summary Matrix (MSM), a DataFrame containing position-wise statistics (entropy, consensus matches, reference mismatches) for each cluster. Also implements background separation and provides visualization tools for sequence logos, attribution logos, and cluster statistics, with support for both PWM-based and enrichment-based analysis. Features GPU acceleration with CPU fallbacks.

  • Identifier: Analyzes cluster-averaged attribution maps in conjunction with the MSM to identify precise locations of motifs and their epistatic interactions.

Module Relationships:

SEAM's modules form an integrated pipeline where outputs from earlier modules feed into subsequent analysis. The Mutagenizer generates sequences that are processed by the Compiler and Attributer. These attribution maps are then clustered by the Clusterer, with results from Mutagenizer, Compiler and Attributer integrated by the MetaExplainer to characterize each SEAM-derived mechanism. The Identifier module then analyzes these MetaExplainer outputs to pinpoint specific regulatory elements and their interactions.

Examples

Google Colab examples for applying SEAM on previously-published deep learning models are available at the links below.

Note: Due to memory requirements for calculating distance matrices, Colab Pro may be required for examples using hierarchical clustering with their current settings.

Python script examples are provided in the examples folder for locally running SEAM and exporting outputs to file. Some of these examples include models that are not compatible with the latest libraries supported by Google Colab, including:

Additional dependencies for these Python examples may be required and outlined at the top of each script.

SEAM Interactive Interpretability Tool:

Interactive interpretability tools are currently under development and will be available in a future release. These tools will provide a graphic user interface (GUI) for dynamically interpreting SEAM results, allowing users to explore and analyze pre-computed inputs from the example scripts above.

fig

Citation:

If this code is useful in your work, please cite our paper.

bibtex TODO

License:

Copyright (C) 2023–2025 Evan Seitz, David McCandlish, Justin Kinney, Peter Koo

The software, code sample and their documentation made available on this website could include technical or other mistakes, inaccuracies or typographical errors. We may make changes to the software or documentation made available on its web site at any time without prior notice. We assume no responsibility for errors or omissions in the software or documentation available from its web site. For further details, please see the LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seam_nn-0.5.7.tar.gz (108.9 kB view details)

Uploaded Source

Built Distribution

seam_nn-0.5.7-py3-none-any.whl (118.1 kB view details)

Uploaded Python 3

File details

Details for the file seam_nn-0.5.7.tar.gz.

File metadata

  • Download URL: seam_nn-0.5.7.tar.gz
  • Upload date:
  • Size: 108.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for seam_nn-0.5.7.tar.gz
Algorithm Hash digest
SHA256 8a328f17b959867da032e12f7e7c1f77f85d765bb5a1e24e7df923ca89da24e2
MD5 a31e5de0ac1844fe1d1b46a11f6c6232
BLAKE2b-256 8798ca3da41acf6f1421dc63c761eb5a53d2006e5d373cf6f4e459db577c060f

See more details on using hashes here.

File details

Details for the file seam_nn-0.5.7-py3-none-any.whl.

File metadata

  • Download URL: seam_nn-0.5.7-py3-none-any.whl
  • Upload date:
  • Size: 118.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for seam_nn-0.5.7-py3-none-any.whl
Algorithm Hash digest
SHA256 d838e7910844de8d4564be9f3edc2578fc937dee0db401b85a56e50ca73ca463
MD5 87d4639bea111ed77789c999202b05d4
BLAKE2b-256 ba6533c1084ae9c60a65ae7177eb79291e9ed8ca9e954a1c70c1df85916ffa5c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page