Skip to main content

TF-MInDi: Transcription Factor Motifs and Instances Discovery

Project description

TF-MINDI: Transcription Factor Motif Instance Neighborhood Decomposition and Interpretation

Tests Documentation

TF-MINDI is a Python package for analyzing transcription factor binding patterns from deep learning model attribution scores. It identifies and clusters sequence motifs from contribution scores, maps them to DNA-binding domains, and provides comprehensive visualization tools for regulatory genomics analysis.

Getting Started

Please refer to the documentation for detailed tutorials and examples, in particular, the API documentation and Tutorials

Key Features

  • Seqlet Extraction: Identifies important sequence regions from contribution scores using recursive seqlet calling from tangermeme
  • Motif Similarity Analysis: Compares extracted seqlets to known motif databases using TomTom
  • Clustering & Dimensionality Reduction: Groups similar seqlets using Leiden clustering and t-SNE visualization
  • DNA-Binding Domain Annotation: Maps seqlet clusters to transcription factor families
  • Pattern Generation: Creates consensus motifs from clustered seqlets with alignment
  • Comprehensive Visualization: Region-level contribution plots, t-SNE embeddings, motif logos, and heatmaps

Installation

tfmindi is compatible with python version 3.10-3.12.

CPU Version (Default)

pip install tfmindi

GPU-Accelerated Version (Recommended for large datasets)

# Requires CUDA-compatible GPU (CUDA 12.X)
pip install tfmindi[gpu]

The GPU version provides significant speedups for:

  • PCA computation
  • Neighborhood graph construction
  • t-SNE embedding
  • Leiden clustering

We're still working on making the tfmindi package as GPU-compatible as possible. If tfmindi can't find the GPU, try importing rapids_singlecell directly in python and see what errors you get. You might have to explicitly set your LD_LIBRARY_PATH for cuml as described here.

Quick Start

TF-MINDI follows a scanpy-inspired workflow:

  1. Preprocessing (tm.pp): Extract seqlets, calculate motif similarities, and create an Anndata object
  2. Tools (tm.tl): Cluster seqlets and create consensus patterns
  3. Plotting (tm.pl): Visualize results
import tfmindi as tm

# Optional: Check GPU availability and set backend
print(f"GPU available: {tm.is_gpu_available()}")
print(f"Current backend: {tm.get_backend()}")
# tm.set_backend('gpu')  # Force GPU backend
# tm.set_backend('cpu')  # Swap back to CPU backend

# Extract seqlets from contribution scores
seqlets_df, seqlet_matrices = tm.pp.extract_seqlets(
    contrib=contrib_scores,  # (n_examples, 4, length)
    oh=one_hot_sequences,    # (n_examples, 4, length)
    threshold=0.05
)

# Calculate motif similarity
motif_collection = tm.load_motif_collection(
    tm.fetch_motif_collection()
)
similarity_matrix = tm.pp.calculate_motif_similarity(
    seqlet_matrices,
    motif_collection,
    chunk_size=10000
)

# Create AnnData object for analysis
adata = tm.pp.create_seqlet_adata(
    similarity_matrix,
    seqlets_df,
    seqlet_matrices=seqlet_matrices,
    oh_sequences=one_hot_sequences,
    contrib_scores=contrib_scores,
    motif_collection=motif_collection
)

# Cluster seqlets and annotate with DNA-binding domains
tm.tl.cluster_seqlets(adata, resolution=3.0)

# Generate consensus logos for each cluster
patterns = tm.tl.create_patterns(adata)

# Visualize results
tm.pl.tsne(adata, color_by="cluster_dbd")
tm.pl.region_contributions(adata, example_idx=0)
tm.pl.dbd_heatmap(adata)

Release Notes

See the changelog.

Contact

If you found a bug, please use the issue tracker.

Citation

De Winter S. et al. (2026). System-wide extraction of cis-regulatory rules from sequence-to-function models in human neural development. BioRxiv. https://doi.org/10.64898/2026.01.14.699402

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tfmindi-1.2.0.tar.gz (6.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tfmindi-1.2.0-py3-none-any.whl (66.6 kB view details)

Uploaded Python 3

File details

Details for the file tfmindi-1.2.0.tar.gz.

File metadata

  • Download URL: tfmindi-1.2.0.tar.gz
  • Upload date:
  • Size: 6.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tfmindi-1.2.0.tar.gz
Algorithm Hash digest
SHA256 b9b0f3711a370f9639ea1f779b64d426a702170222ab62fd43e0be562989308a
MD5 23ad65e87aaceab8add050f19a624966
BLAKE2b-256 390d164d848f75cc4fb0d04a996c870ccb23fb428b8e3f2bd07923ae16faa0b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for tfmindi-1.2.0.tar.gz:

Publisher: release.yaml on aertslab/TF-MINDI

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tfmindi-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: tfmindi-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 66.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tfmindi-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3a7d2fbb9ec4e3616ae7922e72def90622209a9ee6b1e1901df693e6ccc08c21
MD5 6ddd491140f0c71bbc6fd8e8582f832b
BLAKE2b-256 24ef74ef976c29ea750a88069a0ff79d19ef29a663ba6a7164acf464c43130ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for tfmindi-1.2.0-py3-none-any.whl:

Publisher: release.yaml on aertslab/TF-MINDI

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page