difflayers: Diffusion-Augmented Hopfield Networks

These details have not been verified by PyPI

Project links

Project description

difflayers — Diffusion-Augmented Hopfield Networks

difflayers is a PyTorch library that extends modern continuous Hopfield networks with graph-based Laplacian diffusion, turning associative memory layers into structure-aware retrievers. At its core sits the Diffusion-Augmented Hopfield Network (DAHN) — a drop-in upgrade to standard Hopfield attention that pre-smooths patterns over a learned kNN graph before every association step, suppressing spurious retrievals and sharpening metastable energy minima.

The library ships the full original Hopfield layer suite (Hopfield, HopfieldPooling, HopfieldLayer) plus the DAHN extensions (DiffusedHopfield, four diffusion operators, a graph-construction pipeline, and a dynamical memory engine) — all under a single, clean API.

Background
What DAHN Adds
Architecture Overview
Installation
Quick Start
Core Modules
Diffusion Modes
DiffusionConfig Reference
Graph Pipeline
Advanced Usage
Transformer Integration
Example Notebooks
Running Experiments
API Reference
Complexity Guide
Background Paper
Disclaimer
License

Background

Modern Hopfield networks with continuous states were introduced in Ramsauer et al. (2020), where it was shown that the transformer attention mechanism is exactly the update rule of a continuous Hopfield network. This re-framing unlocks exponential storage capacity, single-step convergence, and a clean energy-based interpretation of deep attention.

The energy function of a continuous Hopfield network is:

$$E = -\text{lse}(\beta, X \xi) + \frac{1}{2}\xi^T \xi + \frac{1}{\beta}\log N + \frac{1}{2}M^2$$

where $\text{lse}(\beta, z) = \frac{1}{\beta}\log\sum_i e^{\beta z_i}$ is the log-sum-exp, $\xi$ is the state pattern (query), $X$ are the stored patterns (keys), $\beta$ is the inverse temperature, and $N$, $M$ are dimensional constants.

Energy minimization via one synchronous update yields the familiar softmax attention:

$$\xi^{\text{new}} = X^\top \text{softmax}(\beta X \xi)$$

The network can store exponentially many patterns (in the dimension $d$), converges in one update step, and has exponentially small retrieval errors — properties not shared by classical binary Hopfield networks.

Three classes of fixed points (energy minima) arise naturally:

Fixed-point type	Regime	Behaviour
Global averaging	Low $\beta$	Retrieves a weighted average of all patterns
Metastable states	Medium $\beta$	Retrieves a subset of patterns — analogous to multi-head attention
Single-pattern storage	High $\beta$	Sharply retrieves one stored pattern

What DAHN Adds

Standard Hopfield attention treats every stored pattern as equally reachable from any query. In high-noise or high-density memory scenarios, the attention distribution spreads over spurious neighbours, degrading retrieval accuracy.

DAHN addresses this by building a $k$-nearest-neighbour graph over the pattern set and pre-smoothing patterns with the graph Laplacian before every association step. The dynamics loop is:

$$\text{for } t = 1, \ldots, T:$$ $$K' = \underbrace{(I - \eta L)}_{\text{diffusion}} K, \quad Q' = (I - \eta L) Q \quad \text{(optional)}$$ $$\text{output} = \text{softmax}(\beta , Q' {K'}^\top) , V$$

where $L$ is the (optionally symmetric-normalized) graph Laplacian of the kNN similarity graph over $K$, and $\eta$ is the diffusion strength. This smoothing:

Clusters related patterns before retrieval, reducing inter-cluster interference
Sharpens metastable energy minima, improving single-pattern retrieval accuracy under noise
Preserves the Hopfield energy landscape (diffusion decreases the energy, never creates new spurious minima)
Scales gracefully: with FactoredDiffusion and sparse adjacency the full loop costs $O(kNd)$ per step

Architecture Overview

difflayers/
│
├── __init__.py              # Public API — 18 exported names
│
├── activation.py            # HopfieldCore  (multi-head Hopfield attention kernel)
├── functional.py            # hopfield_core_forward  (low-level functional API)
├── transformer.py           # HopfieldEncoderLayer, HopfieldDecoderLayer
│
├── diffused_attention.py    # DiffusedHopfield  ← DAHN entry point
├── diffusion.py             # DiffusionOperator ABC + 4 concrete strategies
│                            #   SimpleDiffusion, IterativeDiffusion,
│                            #   SpectralDiffusion, FactoredDiffusion
├── dynamics_engine.py       # DiffusionConfig, GraphCache, DynamicsEngine,
│                            #   EnergyTracker
├── attention_operator.py    # AttentionOperator (dense / graph-constrained)
│
├── graph/
│   ├── build_graph.py       # build_similarity_matrix, build_knn_graph
│   ├── laplacian.py         # compute_laplacian, compute_normalized_laplacian
│   ├── builder.py           # GraphBuilder  (fluent graph-construction API)
│   └── laplacian_builder.py # LaplacianBuilder
│
└── auxiliary/
    └── data.py              # LookupTableDataset

Installation

From PyPI (recommended)

pip install difflayers

From source

git clone https://github.com/Prigoistic/mha-layers.git
cd mha-layers
pip install -e .

Dependencies

Package	Minimum version
Python	3.8
PyTorch	1.9.0
NumPy	1.20.0
SciPy	1.7.0

For the example notebooks, install the extra requirements:

pip install -r examples/requirements.txt

Quick Start

import torch
from difflayers import Hopfield, HopfieldPooling, HopfieldLayer, DiffusedHopfield

# ------------------------------------------------------------------
# 1. Standard Hopfield attention  (query x stored-pattern lookup)
# ------------------------------------------------------------------
hopfield = Hopfield(input_size=64, num_heads=4, batch_first=True)

queries     = torch.randn(8, 10, 64)   # (batch, query_len, d)
stored      = torch.randn(8, 50, 64)   # (batch, memory_size, d)
projections = torch.randn(8, 50, 64)   # (batch, memory_size, d)

output = hopfield((stored, queries, projections))
# output: (8, 10, 64)

# ------------------------------------------------------------------
# 2. Hopfield pooling  (sequence -> fixed-size embedding)
# ------------------------------------------------------------------
pooling  = HopfieldPooling(input_size=64, num_heads=1, batch_first=True)
sequence = torch.randn(8, 100, 64)
pooled   = pooling(sequence)
# pooled: (8, 1, 64)  — one trained state-pattern queries over the sequence

# ------------------------------------------------------------------
# 3. Hopfield lookup  (static trainable memory)
# ------------------------------------------------------------------
lookup = HopfieldLayer(input_size=64, num_pattern_repetitions=32)
query  = torch.randn(8, 10, 64)
result = lookup(query)
# result: (8, 10, 64)

# ------------------------------------------------------------------
# 4. DiffusedHopfield  (graph-diffusion augmented retrieval)
# ------------------------------------------------------------------
dh = DiffusedHopfield(
    input_size=64,
    num_heads=4,
    batch_first=True,
    eta=0.1,                    # diffusion strength eta
    k_neighbors=8,              # kNN graph degree
    diffusion_mode="factored",  # O(kNd) — fastest
    diffusion_steps=3,          # T iterations of diffuse -> attend
    diffuse_key=True,           # smooth stored patterns
    diffuse_query=False,        # optionally also smooth queries
)
output = dh((stored, queries, projections))
# output: (8, 10, 64)  — same shape, sharper retrieval

Core Modules

Hopfield

The base continuous Hopfield attention layer. A direct PyTorch-compatible re-implementation of multi-head attention whose weights are derived from the Hopfield energy update rule rather than learned linear projections.

from difflayers import Hopfield

hopfield = Hopfield(
    input_size=128,            # depth of state (query) patterns
    hidden_size=64,            # depth of the association (Hopfield) space
    output_size=128,           # depth of the output projection
    num_heads=8,               # parallel association heads
    scaling=None,              # beta; auto-set to 1/sqrt(head_dim) if None
    update_steps_max=0,        # 0 = one synchronous update (default/recommended)
    update_steps_eps=1e-4,     # convergence threshold for iterative updates
    normalize_stored_pattern=True,   # LayerNorm on keys
    normalize_state_pattern=True,    # LayerNorm on queries
    batch_first=True,
    dropout=0.1,
)

Key parameters:

Parameter	Type	Default	Description
`input_size`	`int`	`None`	Feature depth of state (query) patterns
`hidden_size`	`int`	`None`	Hopfield association space depth; defaults to `input_size`
`output_size`	`int`	`None`	Output projection depth; defaults to `input_size`
`num_heads`	`int`	`1`	Parallel association heads
`scaling`	`float`	`None`	Inverse temperature beta; `None` => 1/sqrt(d_head)
`update_steps_max`	`int`	`0`	Max synchronous update iterations (`None` = run to convergence)
`batch_first`	`bool`	`True`	Input layout: `(batch, seq, d)` when `True`, `(seq, batch, d)` when `False`
`stored_pattern_as_static`	`bool`	`False`	Freeze stored patterns (no gradient through keys)
`disable_out_projection`	`bool`	`False`	Skip the final linear projection (useful for retrieval tasks)

HopfieldPooling

Replaces traditional pooling (mean, max, attention-based) with a Hopfield-energy-based alternative. A single trainable state pattern acts as the query and computes softmax weights over the input sequence, producing a fixed-size summary vector regardless of input length.

from difflayers import HopfieldPooling

pooling = HopfieldPooling(
    input_size=128,
    num_heads=4,
    batch_first=True,
    dropout=0.1,
)

# Collapse a variable-length sequence to a single vector
sequence = torch.randn(batch, seq_len, 128)
pooled   = pooling(sequence)   # (batch, 1, 128)

Useful anywhere you need a permutation-invariant sequence summarisation — bag-of-words classification, set encoding, immune repertoire profiling, etc.

HopfieldLayer

A trainable, input-independent lookup table. One or more stored patterns and their projections are learned parameters; given a query, the layer retrieves the most energy-aligned stored vector — acting like a content-addressable memory with learned slots.

from difflayers import HopfieldLayer

lookup = HopfieldLayer(
    input_size=128,
    num_pattern_repetitions=64,  # number of learned memory slots
    batch_first=True,
)

query  = torch.randn(batch, seq_len, 128)
result = lookup(query)   # (batch, seq_len, 128)

This is distinct from Hopfield in that the memory contents are learned parameters, not runtime inputs — suitable for slot-attention, prototype networks, or any scenario where memory is fixed at training time.

DiffusedHopfield

The DAHN module. A full drop-in replacement for Hopfield that augments the association with a graph-diffusion pre-processing step. Internally it builds a kNN cosine-similarity graph over the stored patterns, constructs the graph Laplacian, and runs a configurable diffusion-attention loop.

from difflayers import DiffusedHopfield

dh = DiffusedHopfield(
    # --- All standard Hopfield arguments are accepted ---
    input_size=128,
    num_heads=4,
    batch_first=True,
    scaling=1.0,

    # --- DAHN-specific arguments ---
    eta=0.1,                       # diffusion strength eta in (0, 0.5)
    k_neighbors=8,                 # kNN graph degree
    diffusion_mode="factored",     # "factored" | "simple" | "iterative" | "spectral"
    diffusion_steps=3,             # T (ignored by "simple"; used by iterative/spectral)
    use_normalized_laplacian=True, # symmetric-normalised L (recommended)
    diffuse_key=True,              # smooth stored patterns (keys)
    diffuse_query=False,           # optionally smooth query patterns too
    use_sparse=False,              # sparse adjacency for O(kN) memory
    use_logit_diffusion=False,     # also smooth post-softmax attention weights
    logit_eta=None,                # eta for logit diffusion; defaults to eta
    adaptive_eta=False,            # scale eta by attention entropy at runtime
    cache_graph=True,              # reuse graph across forward passes
    energy_stop_tol=0.0,           # early-stop on |Delta E| < tol (0 = disabled)
)

The forward signature is identical to Hopfield:

output = dh((stored_patterns, state_patterns, pattern_projections))
# or with masking
output = dh((stored_patterns, state_patterns, pattern_projections),
            stored_pattern_padding_mask=mask)

Diffusion Modes

Four diffusion strategies are available, trading off speed, memory, and smoothing quality:

`"factored"` (default — recommended)

x' = (1 - eta * deg) * x  +  eta * W @ x

Never forms the full Laplacian matrix. Stores only the sparse adjacency W and degree vector deg. Each step costs O(kNd) in time and O(kN) in memory. Best for large N and sparse graphs.

`"simple"`

x' = (I - eta * L) @ x

One explicit Euler step of heat diffusion. Forms D = I - eta*L once and applies it. Cost: O(N^2 * d) per step.

`"iterative"`

x' = (I - eta * L)^T @ x

Applies the same operator D repeatedly for T steps (diffusion_steps). Provides deeper smoothing at the cost of T * O(N^2 * d). Includes a numerical guard against divergence.

`"spectral"`

x' = U @ diag(exp(-eta * lambda)) @ U.T @ x

Exact heat-kernel diffusion via eigendecomposition of L. Precomputes U and lambda once (O(N^3)), then applies the diagonal filter in O(N^2) per call. Most accurate smoothing; not suitable for large N.

Mode	Precompute	Per-step	Memory	Best for
`factored`	O(N^2) build kNN	O(kNd)	O(kN)	Large N, production
`simple`	O(N^2) build D	O(N^2 d)	O(N^2)	Moderate N, one-shot
`iterative`	O(N^2) build D	O(T * N^2 d)	O(N^2)	Deep smoothing
`spectral`	O(N^3) eigen	O(N^2)	O(N^2)	Small N, exact kernel

DiffusionConfig Reference

DiffusionConfig is a frozen dataclass that bundles all diffusion hyperparameters. You can pass one explicitly to DiffusedHopfield, or let the constructor build it from keyword arguments.

from difflayers import DiffusionConfig

cfg = DiffusionConfig(
    eta=0.1,
    beta=1.0,
    steps=3,
    diffusion_mode="factored",
    attention_mode="dense",        # "dense" | "graph"
    k_neighbors=5,
    use_normalized_laplacian=True,
    use_sparse=False,
    diffuse_key=True,
    diffuse_query=False,
    use_logit_diffusion=False,
    logit_eta=None,
    adaptive_eta=False,
    adaptive_temperature=5.0,
    adaptive_threshold=1.0,
    cache_graph=True,
    energy_stop_tol=0.0,
)

Field	Type	Default	Description
`eta`	`float`	`0.1`	Diffusion strength. For normalised L use eta < 0.5
`beta`	`float`	`1.0`	Hopfield scaling / inverse temperature
`steps`	`int`	`3`	Number of diffuse->attend iterations
`diffusion_mode`	`str`	`"factored"`	One of `"factored"`, `"simple"`, `"iterative"`, `"spectral"`
`attention_mode`	`str`	`"dense"`	`"dense"` (full O(N^2)) or `"graph"` (kNN-constrained O(kN))
`k_neighbors`	`int`	`5`	Number of nearest neighbours in the similarity graph
`use_normalized_laplacian`	`bool`	`True`	Symmetric-normalised L; eigenvalues in [0, 2]
`use_sparse`	`bool`	`False`	Store adjacency as `sparse_coo` for O(kN) memory
`diffuse_key`	`bool`	`True`	Smooth stored patterns (keys) before attention
`diffuse_query`	`bool`	`False`	Smooth state patterns (queries) before attention
`use_logit_diffusion`	`bool`	`False`	Smooth post-softmax attention weights over the key graph
`logit_eta`	`float\|None`	`None`	Separate eta for logit diffusion; falls back to `eta`
`adaptive_eta`	`bool`	`False`	Scale eta by attention entropy (high-entropy -> more diffusion)
`cache_graph`	`bool`	`True`	Re-use built graph across forward passes
`energy_stop_tol`	`float`	`0.0`	Early-stop if abs(Delta E) < tol per step; 0 disables

Graph Pipeline

The graph pipeline under difflayers.graph can be used standalone to build Laplacians for any downstream use:

import torch
from difflayers.graph.build_graph import build_similarity_matrix, build_knn_graph
from difflayers.graph.laplacian import compute_laplacian, compute_normalized_laplacian
from difflayers.graph.builder import GraphBuilder

# --- Manual pipeline ---
X = torch.randn(100, 64)                       # 100 patterns, 64-dim

S = build_similarity_matrix(X)                 # (100, 100) cosine similarity
A = build_knn_graph(S, k=8, as_sparse=False)   # (100, 100) symmetric kNN adjacency
L = compute_normalized_laplacian(A)            # (100, 100) symmetric-normalised Laplacian

# --- Fluent builder API ---
graph = (
    GraphBuilder(X)
    .cosine_similarity()
    .knn(k=8, sparse=True)
    .normalized_laplacian()
    .build()
)
# graph.L    — Laplacian
# graph.W    — adjacency
# graph.deg  — degree vector

build_similarity_matrix(X) Computes pairwise cosine similarities, clamps negatives to zero, and zeros the diagonal (no self-loops). Complexity: O(N^2 d).

build_knn_graph(S, k, as_sparse) Sparsifies the similarity matrix by keeping only the top-k neighbours per node, then symmetrises. When as_sparse=True, returns torch.sparse_coo_tensor for O(kN) downstream products.

compute_laplacian(A) Unnormalised Laplacian L = D - A, where D = diag(A * 1). Eigenvalues in [0, d_max].

compute_normalized_laplacian(A) Symmetric normalised Laplacian L_sym = D^{-1/2} (D - A) D^{-1/2}. Eigenvalues in [0, 2]. Isolated nodes handled safely. Recommended for diffusion because the eigenvalue bound makes stable eta input-independent.

Advanced Usage

Static retrieval (no learned projections)

Useful for direct content-addressable memory benchmarks:

model = DiffusedHopfield(
    input_size=None,
    stored_pattern_as_static=True,
    state_pattern_as_static=True,
    pattern_projection_as_static=True,
    disable_out_projection=True,
    normalize_stored_pattern=False,
    normalize_state_pattern=False,
    normalize_pattern_projection=False,
    normalize_stored_pattern_affine=False,
    normalize_state_pattern_affine=False,
    normalize_pattern_projection_affine=False,
    batch_first=True,
    scaling=4.0,
    eta=0.15,
    k_neighbors=10,
    diffusion_mode="iterative",
    diffusion_steps=5,
    diffuse_key=True,
)

Ablation: diffuse only queries, only keys, or both

# Only diffuse keys (strongest effect; default)
dh_k    = DiffusedHopfield(input_size=64, diffuse_key=True,  diffuse_query=False, eta=0.1)

# Only diffuse queries (useful when queries are noisy)
dh_q    = DiffusedHopfield(input_size=64, diffuse_key=False, diffuse_query=True,  eta=0.1)

# Diffuse both
dh_both = DiffusedHopfield(input_size=64, diffuse_key=True,  diffuse_query=True,  eta=0.1)

Logit-level diffusion

Smooth the post-softmax attention weights over the key graph:

dh = DiffusedHopfield(
    input_size=64,
    diffuse_key=True,
    use_logit_diffusion=True,
    logit_eta=0.05,   # usually smaller than pattern-level eta
)

Adaptive diffusion strength

Scale eta automatically by attention entropy — high-entropy (uncertain) distributions receive more smoothing:

dh = DiffusedHopfield(
    input_size=64,
    adaptive_eta=True,
    eta=0.2,               # maximum eta
    adaptive_temperature=5.0,
    adaptive_threshold=1.0,  # entropy midpoint for sigmoid gate
)

DynamicsEngine + EnergyTracker (low-level API)

from difflayers import DiffusionConfig, DynamicsEngine, EnergyTracker, GraphCache
from difflayers.diffusion import FactoredDiffusion
from difflayers.attention_operator import AttentionOperator

cfg = DiffusionConfig(eta=0.1, steps=5, k_neighbors=8)

# Build graph once
cache  = GraphCache(cfg)
graph  = cache.get(patterns)   # builds kNN + Laplacian; cached on repeated calls

# Build operators
diffusion_op = FactoredDiffusion(graph.W, graph.deg, cfg.eta)
attn_op      = AttentionOperator(beta=cfg.beta, mode=cfg.attention_mode)

# Run the dynamics loop
engine  = DynamicsEngine(diffusion_op, attn_op, cfg)
tracker = EnergyTracker(enabled=True)

Q_out, K_out = engine.run(Q, K, V, tracker=tracker)

print(tracker.energies)   # list of Hopfield energy per step

Transformer Integration

difflayers provides Hopfield-based encoder and decoder layers that slot directly into standard transformer architectures:

from difflayers import HopfieldEncoderLayer, HopfieldDecoderLayer
import torch.nn as nn

encoder = nn.TransformerEncoder(
    encoder_layer=HopfieldEncoderLayer(
        d_model=512,
        nhead=8,
        dim_feedforward=2048,
        dropout=0.1,
        batch_first=True,
    ),
    num_layers=6,
)

decoder = nn.TransformerDecoder(
    decoder_layer=HopfieldDecoderLayer(
        d_model=512,
        nhead=8,
        dim_feedforward=2048,
        dropout=0.1,
        batch_first=True,
    ),
    num_layers=6,
)

HopfieldEncoderLayer and HopfieldDecoderLayer are direct drop-in replacements for PyTorch's built-in transformer layers, with the attention kernel replaced by the Hopfield update rule.

Example Notebooks

The examples/ directory contains three fully worked demonstrations. Install dependencies first:

pip install -r examples/requirements.txt

Bit Pattern Set

A binary classification task in the Multiple Instance Learning (MIL) setting. Each bag contains bit-pattern instances (sequences of 0s and 1s); positive bags have specific class-defining patterns injected that are absent in negative bags. The notebook shows that Hopfield, HopfieldPooling, and HopfieldLayer all learn to filter bags for the discriminative patterns with high accuracy, even as bag size and noise increase.

Latch Sequence Set

A long-term dependency task. A sequence begins with symbol A or B; after a variable delay, the model must output the corresponding symbol. The Hopfield layer concentrates attention sharply on the first position of the sequence, capturing the dependency without positional encoding.

Attention-based Deep MIL (MNIST Bags)

A canonical MIL benchmark from Ilse & Tomczak (2018). Each bag is a collection of 28x28 MNIST images; a bag is positive if it contains a target digit, negative otherwise. The notebook benchmarks Hopfield-based pooling against classic attention-MIL and demonstrates strong accuracy even with large bag sizes.

Running Experiments

All experiments are in src/experiments/ and write results to results/.

# Full ablation study (diffuse Q only / K only / both vs. none)
python -m src.experiments.ablation

# Benchmark diffusion modes (factored, simple, iterative, spectral)
python -m src.experiments.benchmark

# Noise robustness sweep
python -m src.experiments.noise_robustness

# Steps sweep (T = 1 ... 10)
python -m src.experiments.steps_sweep

# Mode comparison (standard Hopfield vs. DiffusedHopfield)
python -m src.experiments.mode_comparison

# Logit vs. feature-level diffusion comparison
python -m src.experiments.logit_vs_feature

# Attention head analysis
python -m src.experiments.attention_analysis

API Reference

All public names exported from difflayers:

Name	Type	Description
`Hopfield`	`nn.Module`	Base continuous Hopfield attention layer
`HopfieldPooling`	`nn.Module`	Hopfield-based pooling with a trainable query
`HopfieldLayer`	`nn.Module`	Trainable static-memory lookup layer
`HopfieldCore`	`nn.Module`	Low-level multi-head Hopfield kernel
`DiffusedHopfield`	`nn.Module`	DAHN: graph-diffusion augmented Hopfield
`HopfieldEncoderLayer`	`nn.Module`	Transformer encoder layer with Hopfield attention
`HopfieldDecoderLayer`	`nn.Module`	Transformer decoder layer with Hopfield attention
`DiffusionOperator`	`ABC`	Abstract base for diffusion strategies
`SimpleDiffusion`	`DiffusionOperator`	One-step explicit Euler diffusion
`IterativeDiffusion`	`DiffusionOperator`	T-step iterative diffusion
`SpectralDiffusion`	`DiffusionOperator`	Exact heat-kernel via eigendecomposition
`FactoredDiffusion`	`DiffusionOperator`	Laplacian-free O(kNd) factored form
`apply_diffusion`	`function`	Functional API for a single diffusion call
`DiffusionConfig`	`dataclass`	Unified serialisable config for DAHN
`GraphCache`	`class`	Builds and caches the kNN graph + Laplacian
`DynamicsEngine`	`class`	Orchestrates the diffuse->attend loop
`EnergyTracker`	`class`	Per-step Hopfield energy logging + early-stop
`GraphBuilder`	`class`	Fluent graph-construction API

Complexity Guide

Operation	Time	Memory	Notes
Build similarity matrix	O(N^2 d)	O(N^2)	`build_similarity_matrix`
Build kNN graph (dense)	O(N^2)	O(N^2)	`build_knn_graph`
Build kNN graph (sparse)	O(N^2)	O(kN)	`as_sparse=True`
Laplacian (dense)	O(N^2)	O(N^2)
`FactoredDiffusion` step	O(kNd)	O(kN)	Recommended for large N
`SimpleDiffusion` step	O(N^2 d)	O(N^2)
`IterativeDiffusion` T steps	O(T N^2 d)	O(N^2)
`SpectralDiffusion` precompute	O(N^3)	O(N^2)	Eigendecomposition
`SpectralDiffusion` apply	O(N^2)	O(N^2)	Per forward pass
Dense Hopfield attention	O(N^2 d)	O(N^2)	`attention_mode="dense"`
Graph-constrained attention	O(kNd)	O(kN)	`attention_mode="graph"`
Full DAHN (factored + dense)	O(T kNd + N^2 d)	O(N^2)	Typical configuration
Full DAHN (factored + graph)	O(T kNd)	O(kN)	Fully sparse end-to-end

N = number of patterns, d = feature dimension, k = kNN degree, T = diffusion steps.

Background Paper

The Hopfield attention foundation is described in:

Hopfield Networks is All You Need Hubert Ramsauer, Bernhard Schaefl, Johannes Lehner, Philipp Seidl, Michael Widrich, Lukas Gruber, Markus Holzleitner, Milena Pavlovic, Geir Kjetil Sandve, Victor Greiff, David Kreil, Michael Kopp, Gunter Klambauer, Johannes Brandstetter, Sepp Hochreiter ICLR 2021 — arxiv.org/abs/2008.02217

A detailed companion blog post covering the theoretical background is available at ml-jku.github.io/hopfield-layers.

Disclaimer

Parts of this implementation are based on PyTorch v1.6.0 and extended for the Hopfield/DAHN setting:

Module	Based on
`difflayers/activation.py` — `HopfieldCore`	`torch.nn.MultiheadAttention`
`difflayers/functional.py` — `hopfield_core_forward`	`torch.nn.functional.multi_head_attention_forward`
`difflayers/transformer.py` — `HopfieldEncoderLayer`	`torch.nn.TransformerEncoderLayer`
`difflayers/transformer.py` — `HopfieldDecoderLayer`	`torch.nn.TransformerDecoderLayer`

License

BSD-style license — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

May 24, 2026

0.1.0

May 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

difflayers-0.1.1.tar.gz (59.3 kB view details)

Uploaded May 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

difflayers-0.1.1-py3-none-any.whl (54.6 kB view details)

Uploaded May 24, 2026 Python 3

File details

Details for the file difflayers-0.1.1.tar.gz.

File metadata

Download URL: difflayers-0.1.1.tar.gz
Upload date: May 24, 2026
Size: 59.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for difflayers-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`36f940000ccc12bf73763eac41d582e2fff23349338adf77662b6c42b01750a1`
MD5	`cf99241ce9fb8b95e9a16871186af5e0`
BLAKE2b-256	`9e0a485a7050bac14d8d047647b0800be6a3dd336e8a1d9b69e4b0c94814d783`

See more details on using hashes here.

File details

Details for the file difflayers-0.1.1-py3-none-any.whl.

File metadata

Download URL: difflayers-0.1.1-py3-none-any.whl
Upload date: May 24, 2026
Size: 54.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for difflayers-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a5b9e1a86a01cf73de53a76d5211b22161f165e97033ee6e60facbc7a0a96820`
MD5	`2e7be48bd71ae3ae9272abb079430a28`
BLAKE2b-256	`5ddfe496f6f030b71f63e4e9fd164ba504add427cd6c63e2977457b934506f1c`

See more details on using hashes here.

difflayers 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

difflayers — Diffusion-Augmented Hopfield Networks

Table of Contents

Background

What DAHN Adds

Architecture Overview

Installation

From PyPI (recommended)

From source

Dependencies

Quick Start

Core Modules

Hopfield

HopfieldPooling

HopfieldLayer

DiffusedHopfield

Diffusion Modes

"factored" (default — recommended)

"simple"

"iterative"

"spectral"

DiffusionConfig Reference

Graph Pipeline

Advanced Usage

Static retrieval (no learned projections)

Ablation: diffuse only queries, only keys, or both

Logit-level diffusion

Adaptive diffusion strength

DynamicsEngine + EnergyTracker (low-level API)

Transformer Integration

Example Notebooks

Bit Pattern Set

Latch Sequence Set

Attention-based Deep MIL (MNIST Bags)

Running Experiments

API Reference

Complexity Guide

Background Paper

Disclaimer

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`"factored"` (default — recommended)

`"simple"`

`"iterative"`

`"spectral"`