Skip to main content

difflayers: Diffusion-Augmented Hopfield Networks

Project description

difflayers — Diffusion-Augmented Hopfield Networks

PyPI Python Versions PyTorch License

difflayers is a PyTorch library that extends modern continuous Hopfield networks with graph-based Laplacian diffusion, turning associative memory layers into structure-aware retrievers. At its core sits the Diffusion-Augmented Hopfield Network (DAHN) — a drop-in upgrade to standard Hopfield attention that pre-smooths patterns over a learned kNN graph before every association step, suppressing spurious retrievals and sharpening metastable energy minima.

The library ships the full original Hopfield layer suite (Hopfield, HopfieldPooling, HopfieldLayer) plus the DAHN extensions (DiffusedHopfield, four diffusion operators, a graph-construction pipeline, and a dynamical memory engine) — all under a single, clean API.


Table of Contents

  1. Background
  2. What DAHN Adds
  3. Architecture Overview
  4. Installation
  5. Quick Start
  6. Core Modules
  7. Diffusion Modes
  8. DiffusionConfig Reference
  9. Graph Pipeline
  10. Advanced Usage
  11. Transformer Integration
  12. Example Notebooks
  13. Running Experiments
  14. API Reference
  15. Complexity Guide
  16. Background Paper
  17. Disclaimer
  18. License

Background

Modern Hopfield networks with continuous states were introduced in Ramsauer et al. (2020), where it was shown that the transformer attention mechanism is exactly the update rule of a continuous Hopfield network. This re-framing unlocks exponential storage capacity, single-step convergence, and a clean energy-based interpretation of deep attention.

The energy function of a continuous Hopfield network is:

$$E = -\text{lse}(\beta, X \xi) + \frac{1}{2}\xi^T \xi + \frac{1}{\beta}\log N + \frac{1}{2}M^2$$

where $\text{lse}(\beta, z) = \frac{1}{\beta}\log\sum_i e^{\beta z_i}$ is the log-sum-exp, $\xi$ is the state pattern (query), $X$ are the stored patterns (keys), $\beta$ is the inverse temperature, and $N$, $M$ are dimensional constants.

Energy minimization via one synchronous update yields the familiar softmax attention:

$$\xi^{\text{new}} = X^\top \text{softmax}(\beta X \xi)$$

The network can store exponentially many patterns (in the dimension $d$), converges in one update step, and has exponentially small retrieval errors — properties not shared by classical binary Hopfield networks.

Three classes of fixed points (energy minima) arise naturally:

Fixed-point type Regime Behaviour
Global averaging Low $\beta$ Retrieves a weighted average of all patterns
Metastable states Medium $\beta$ Retrieves a subset of patterns — analogous to multi-head attention
Single-pattern storage High $\beta$ Sharply retrieves one stored pattern

What DAHN Adds

Standard Hopfield attention treats every stored pattern as equally reachable from any query. In high-noise or high-density memory scenarios, the attention distribution spreads over spurious neighbours, degrading retrieval accuracy.

DAHN addresses this by building a $k$-nearest-neighbour graph over the pattern set and pre-smoothing patterns with the graph Laplacian before every association step. The dynamics loop is:

$$\text{for } t = 1, \ldots, T:$$ $$K' = \underbrace{(I - \eta L)}_{\text{diffusion}} K, \quad Q' = (I - \eta L) Q \quad \text{(optional)}$$ $$\text{output} = \text{softmax}(\beta , Q' {K'}^\top) , V$$

where $L$ is the (optionally symmetric-normalized) graph Laplacian of the kNN similarity graph over $K$, and $\eta$ is the diffusion strength. This smoothing:

  • Clusters related patterns before retrieval, reducing inter-cluster interference
  • Sharpens metastable energy minima, improving single-pattern retrieval accuracy under noise
  • Preserves the Hopfield energy landscape (diffusion decreases the energy, never creates new spurious minima)
  • Scales gracefully: with FactoredDiffusion and sparse adjacency the full loop costs $O(kNd)$ per step

Architecture Overview

difflayers/
│
├── __init__.py              # Public API — 18 exported names
│
├── activation.py            # HopfieldCore  (multi-head Hopfield attention kernel)
├── functional.py            # hopfield_core_forward  (low-level functional API)
├── transformer.py           # HopfieldEncoderLayer, HopfieldDecoderLayer
│
├── diffused_attention.py    # DiffusedHopfield  ← DAHN entry point
├── diffusion.py             # DiffusionOperator ABC + 4 concrete strategies
│                            #   SimpleDiffusion, IterativeDiffusion,
│                            #   SpectralDiffusion, FactoredDiffusion
├── dynamics_engine.py       # DiffusionConfig, GraphCache, DynamicsEngine,
│                            #   EnergyTracker
├── attention_operator.py    # AttentionOperator (dense / graph-constrained)
│
├── graph/
│   ├── build_graph.py       # build_similarity_matrix, build_knn_graph
│   ├── laplacian.py         # compute_laplacian, compute_normalized_laplacian
│   ├── builder.py           # GraphBuilder  (fluent graph-construction API)
│   └── laplacian_builder.py # LaplacianBuilder
│
└── auxiliary/
    └── data.py              # LookupTableDataset

Installation

From PyPI (recommended)

pip install difflayers

From source

git clone https://github.com/Prigoistic/mha-layers.git
cd mha-layers
pip install -e .

Dependencies

Package Minimum version
Python 3.8
PyTorch 1.9.0
NumPy 1.20.0
SciPy 1.7.0

For the example notebooks, install the extra requirements:

pip install -r examples/requirements.txt

Quick Start

import torch
from difflayers import Hopfield, HopfieldPooling, HopfieldLayer, DiffusedHopfield

# ------------------------------------------------------------------
# 1. Standard Hopfield attention  (query x stored-pattern lookup)
# ------------------------------------------------------------------
hopfield = Hopfield(input_size=64, num_heads=4, batch_first=True)

queries     = torch.randn(8, 10, 64)   # (batch, query_len, d)
stored      = torch.randn(8, 50, 64)   # (batch, memory_size, d)
projections = torch.randn(8, 50, 64)   # (batch, memory_size, d)

output = hopfield((stored, queries, projections))
# output: (8, 10, 64)

# ------------------------------------------------------------------
# 2. Hopfield pooling  (sequence -> fixed-size embedding)
# ------------------------------------------------------------------
pooling  = HopfieldPooling(input_size=64, num_heads=1, batch_first=True)
sequence = torch.randn(8, 100, 64)
pooled   = pooling(sequence)
# pooled: (8, 1, 64)  — one trained state-pattern queries over the sequence

# ------------------------------------------------------------------
# 3. Hopfield lookup  (static trainable memory)
# ------------------------------------------------------------------
lookup = HopfieldLayer(input_size=64, num_pattern_repetitions=32)
query  = torch.randn(8, 10, 64)
result = lookup(query)
# result: (8, 10, 64)

# ------------------------------------------------------------------
# 4. DiffusedHopfield  (graph-diffusion augmented retrieval)
# ------------------------------------------------------------------
dh = DiffusedHopfield(
    input_size=64,
    num_heads=4,
    batch_first=True,
    eta=0.1,                    # diffusion strength eta
    k_neighbors=8,              # kNN graph degree
    diffusion_mode="factored",  # O(kNd) — fastest
    diffusion_steps=3,          # T iterations of diffuse -> attend
    diffuse_key=True,           # smooth stored patterns
    diffuse_query=False,        # optionally also smooth queries
)
output = dh((stored, queries, projections))
# output: (8, 10, 64)  — same shape, sharper retrieval

Core Modules

Hopfield

The base continuous Hopfield attention layer. A direct PyTorch-compatible re-implementation of multi-head attention whose weights are derived from the Hopfield energy update rule rather than learned linear projections.

from difflayers import Hopfield

hopfield = Hopfield(
    input_size=128,            # depth of state (query) patterns
    hidden_size=64,            # depth of the association (Hopfield) space
    output_size=128,           # depth of the output projection
    num_heads=8,               # parallel association heads
    scaling=None,              # beta; auto-set to 1/sqrt(head_dim) if None
    update_steps_max=0,        # 0 = one synchronous update (default/recommended)
    update_steps_eps=1e-4,     # convergence threshold for iterative updates
    normalize_stored_pattern=True,   # LayerNorm on keys
    normalize_state_pattern=True,    # LayerNorm on queries
    batch_first=True,
    dropout=0.1,
)

Key parameters:

Parameter Type Default Description
input_size int None Feature depth of state (query) patterns
hidden_size int None Hopfield association space depth; defaults to input_size
output_size int None Output projection depth; defaults to input_size
num_heads int 1 Parallel association heads
scaling float None Inverse temperature beta; None => 1/sqrt(d_head)
update_steps_max int 0 Max synchronous update iterations (None = run to convergence)
batch_first bool True Input layout: (batch, seq, d) when True, (seq, batch, d) when False
stored_pattern_as_static bool False Freeze stored patterns (no gradient through keys)
disable_out_projection bool False Skip the final linear projection (useful for retrieval tasks)

HopfieldPooling

Replaces traditional pooling (mean, max, attention-based) with a Hopfield-energy-based alternative. A single trainable state pattern acts as the query and computes softmax weights over the input sequence, producing a fixed-size summary vector regardless of input length.

from difflayers import HopfieldPooling

pooling = HopfieldPooling(
    input_size=128,
    num_heads=4,
    batch_first=True,
    dropout=0.1,
)

# Collapse a variable-length sequence to a single vector
sequence = torch.randn(batch, seq_len, 128)
pooled   = pooling(sequence)   # (batch, 1, 128)

Useful anywhere you need a permutation-invariant sequence summarisation — bag-of-words classification, set encoding, immune repertoire profiling, etc.


HopfieldLayer

A trainable, input-independent lookup table. One or more stored patterns and their projections are learned parameters; given a query, the layer retrieves the most energy-aligned stored vector — acting like a content-addressable memory with learned slots.

from difflayers import HopfieldLayer

lookup = HopfieldLayer(
    input_size=128,
    num_pattern_repetitions=64,  # number of learned memory slots
    batch_first=True,
)

query  = torch.randn(batch, seq_len, 128)
result = lookup(query)   # (batch, seq_len, 128)

This is distinct from Hopfield in that the memory contents are learned parameters, not runtime inputs — suitable for slot-attention, prototype networks, or any scenario where memory is fixed at training time.


DiffusedHopfield

The DAHN module. A full drop-in replacement for Hopfield that augments the association with a graph-diffusion pre-processing step. Internally it builds a kNN cosine-similarity graph over the stored patterns, constructs the graph Laplacian, and runs a configurable diffusion-attention loop.

from difflayers import DiffusedHopfield

dh = DiffusedHopfield(
    # --- All standard Hopfield arguments are accepted ---
    input_size=128,
    num_heads=4,
    batch_first=True,
    scaling=1.0,

    # --- DAHN-specific arguments ---
    eta=0.1,                       # diffusion strength eta in (0, 0.5)
    k_neighbors=8,                 # kNN graph degree
    diffusion_mode="factored",     # "factored" | "simple" | "iterative" | "spectral"
    diffusion_steps=3,             # T (ignored by "simple"; used by iterative/spectral)
    use_normalized_laplacian=True, # symmetric-normalised L (recommended)
    diffuse_key=True,              # smooth stored patterns (keys)
    diffuse_query=False,           # optionally smooth query patterns too
    use_sparse=False,              # sparse adjacency for O(kN) memory
    use_logit_diffusion=False,     # also smooth post-softmax attention weights
    logit_eta=None,                # eta for logit diffusion; defaults to eta
    adaptive_eta=False,            # scale eta by attention entropy at runtime
    cache_graph=True,              # reuse graph across forward passes
    energy_stop_tol=0.0,           # early-stop on |Delta E| < tol (0 = disabled)
)

The forward signature is identical to Hopfield:

output = dh((stored_patterns, state_patterns, pattern_projections))
# or with masking
output = dh((stored_patterns, state_patterns, pattern_projections),
            stored_pattern_padding_mask=mask)

Diffusion Modes

Four diffusion strategies are available, trading off speed, memory, and smoothing quality:

"factored" (default — recommended)

x' = (1 - eta * deg) * x  +  eta * W @ x

Never forms the full Laplacian matrix. Stores only the sparse adjacency W and degree vector deg. Each step costs O(kNd) in time and O(kN) in memory. Best for large N and sparse graphs.

"simple"

x' = (I - eta * L) @ x

One explicit Euler step of heat diffusion. Forms D = I - eta*L once and applies it. Cost: O(N^2 * d) per step.

"iterative"

x' = (I - eta * L)^T @ x

Applies the same operator D repeatedly for T steps (diffusion_steps). Provides deeper smoothing at the cost of T * O(N^2 * d). Includes a numerical guard against divergence.

"spectral"

x' = U @ diag(exp(-eta * lambda)) @ U.T @ x

Exact heat-kernel diffusion via eigendecomposition of L. Precomputes U and lambda once (O(N^3)), then applies the diagonal filter in O(N^2) per call. Most accurate smoothing; not suitable for large N.

Mode Precompute Per-step Memory Best for
factored O(N^2) build kNN O(kNd) O(kN) Large N, production
simple O(N^2) build D O(N^2 d) O(N^2) Moderate N, one-shot
iterative O(N^2) build D O(T * N^2 d) O(N^2) Deep smoothing
spectral O(N^3) eigen O(N^2) O(N^2) Small N, exact kernel

DiffusionConfig Reference

DiffusionConfig is a frozen dataclass that bundles all diffusion hyperparameters. You can pass one explicitly to DiffusedHopfield, or let the constructor build it from keyword arguments.

from difflayers import DiffusionConfig

cfg = DiffusionConfig(
    eta=0.1,
    beta=1.0,
    steps=3,
    diffusion_mode="factored",
    attention_mode="dense",        # "dense" | "graph"
    k_neighbors=5,
    use_normalized_laplacian=True,
    use_sparse=False,
    diffuse_key=True,
    diffuse_query=False,
    use_logit_diffusion=False,
    logit_eta=None,
    adaptive_eta=False,
    adaptive_temperature=5.0,
    adaptive_threshold=1.0,
    cache_graph=True,
    energy_stop_tol=0.0,
)
Field Type Default Description
eta float 0.1 Diffusion strength. For normalised L use eta < 0.5
beta float 1.0 Hopfield scaling / inverse temperature
steps int 3 Number of diffuse->attend iterations
diffusion_mode str "factored" One of "factored", "simple", "iterative", "spectral"
attention_mode str "dense" "dense" (full O(N^2)) or "graph" (kNN-constrained O(kN))
k_neighbors int 5 Number of nearest neighbours in the similarity graph
use_normalized_laplacian bool True Symmetric-normalised L; eigenvalues in [0, 2]
use_sparse bool False Store adjacency as sparse_coo for O(kN) memory
diffuse_key bool True Smooth stored patterns (keys) before attention
diffuse_query bool False Smooth state patterns (queries) before attention
use_logit_diffusion bool False Smooth post-softmax attention weights over the key graph
logit_eta float|None None Separate eta for logit diffusion; falls back to eta
adaptive_eta bool False Scale eta by attention entropy (high-entropy -> more diffusion)
cache_graph bool True Re-use built graph across forward passes
energy_stop_tol float 0.0 Early-stop if abs(Delta E) < tol per step; 0 disables

Graph Pipeline

The graph pipeline under difflayers.graph can be used standalone to build Laplacians for any downstream use:

import torch
from difflayers.graph.build_graph import build_similarity_matrix, build_knn_graph
from difflayers.graph.laplacian import compute_laplacian, compute_normalized_laplacian
from difflayers.graph.builder import GraphBuilder

# --- Manual pipeline ---
X = torch.randn(100, 64)                       # 100 patterns, 64-dim

S = build_similarity_matrix(X)                 # (100, 100) cosine similarity
A = build_knn_graph(S, k=8, as_sparse=False)   # (100, 100) symmetric kNN adjacency
L = compute_normalized_laplacian(A)            # (100, 100) symmetric-normalised Laplacian

# --- Fluent builder API ---
graph = (
    GraphBuilder(X)
    .cosine_similarity()
    .knn(k=8, sparse=True)
    .normalized_laplacian()
    .build()
)
# graph.L    — Laplacian
# graph.W    — adjacency
# graph.deg  — degree vector

build_similarity_matrix(X) Computes pairwise cosine similarities, clamps negatives to zero, and zeros the diagonal (no self-loops). Complexity: O(N^2 d).

build_knn_graph(S, k, as_sparse) Sparsifies the similarity matrix by keeping only the top-k neighbours per node, then symmetrises. When as_sparse=True, returns torch.sparse_coo_tensor for O(kN) downstream products.

compute_laplacian(A) Unnormalised Laplacian L = D - A, where D = diag(A * 1). Eigenvalues in [0, d_max].

compute_normalized_laplacian(A) Symmetric normalised Laplacian L_sym = D^{-1/2} (D - A) D^{-1/2}. Eigenvalues in [0, 2]. Isolated nodes handled safely. Recommended for diffusion because the eigenvalue bound makes stable eta input-independent.


Advanced Usage

Static retrieval (no learned projections)

Useful for direct content-addressable memory benchmarks:

model = DiffusedHopfield(
    input_size=None,
    stored_pattern_as_static=True,
    state_pattern_as_static=True,
    pattern_projection_as_static=True,
    disable_out_projection=True,
    normalize_stored_pattern=False,
    normalize_state_pattern=False,
    normalize_pattern_projection=False,
    normalize_stored_pattern_affine=False,
    normalize_state_pattern_affine=False,
    normalize_pattern_projection_affine=False,
    batch_first=True,
    scaling=4.0,
    eta=0.15,
    k_neighbors=10,
    diffusion_mode="iterative",
    diffusion_steps=5,
    diffuse_key=True,
)

Ablation: diffuse only queries, only keys, or both

# Only diffuse keys (strongest effect; default)
dh_k    = DiffusedHopfield(input_size=64, diffuse_key=True,  diffuse_query=False, eta=0.1)

# Only diffuse queries (useful when queries are noisy)
dh_q    = DiffusedHopfield(input_size=64, diffuse_key=False, diffuse_query=True,  eta=0.1)

# Diffuse both
dh_both = DiffusedHopfield(input_size=64, diffuse_key=True,  diffuse_query=True,  eta=0.1)

Logit-level diffusion

Smooth the post-softmax attention weights over the key graph:

dh = DiffusedHopfield(
    input_size=64,
    diffuse_key=True,
    use_logit_diffusion=True,
    logit_eta=0.05,   # usually smaller than pattern-level eta
)

Adaptive diffusion strength

Scale eta automatically by attention entropy — high-entropy (uncertain) distributions receive more smoothing:

dh = DiffusedHopfield(
    input_size=64,
    adaptive_eta=True,
    eta=0.2,               # maximum eta
    adaptive_temperature=5.0,
    adaptive_threshold=1.0,  # entropy midpoint for sigmoid gate
)

DynamicsEngine + EnergyTracker (low-level API)

from difflayers import DiffusionConfig, DynamicsEngine, EnergyTracker, GraphCache
from difflayers.diffusion import FactoredDiffusion
from difflayers.attention_operator import AttentionOperator

cfg = DiffusionConfig(eta=0.1, steps=5, k_neighbors=8)

# Build graph once
cache  = GraphCache(cfg)
graph  = cache.get(patterns)   # builds kNN + Laplacian; cached on repeated calls

# Build operators
diffusion_op = FactoredDiffusion(graph.W, graph.deg, cfg.eta)
attn_op      = AttentionOperator(beta=cfg.beta, mode=cfg.attention_mode)

# Run the dynamics loop
engine  = DynamicsEngine(diffusion_op, attn_op, cfg)
tracker = EnergyTracker(enabled=True)

Q_out, K_out = engine.run(Q, K, V, tracker=tracker)

print(tracker.energies)   # list of Hopfield energy per step

Transformer Integration

difflayers provides Hopfield-based encoder and decoder layers that slot directly into standard transformer architectures:

from difflayers import HopfieldEncoderLayer, HopfieldDecoderLayer
import torch.nn as nn

encoder = nn.TransformerEncoder(
    encoder_layer=HopfieldEncoderLayer(
        d_model=512,
        nhead=8,
        dim_feedforward=2048,
        dropout=0.1,
        batch_first=True,
    ),
    num_layers=6,
)

decoder = nn.TransformerDecoder(
    decoder_layer=HopfieldDecoderLayer(
        d_model=512,
        nhead=8,
        dim_feedforward=2048,
        dropout=0.1,
        batch_first=True,
    ),
    num_layers=6,
)

HopfieldEncoderLayer and HopfieldDecoderLayer are direct drop-in replacements for PyTorch's built-in transformer layers, with the attention kernel replaced by the Hopfield update rule.


Example Notebooks

The examples/ directory contains three fully worked demonstrations. Install dependencies first:

pip install -r examples/requirements.txt

Bit Pattern Set

A binary classification task in the Multiple Instance Learning (MIL) setting. Each bag contains bit-pattern instances (sequences of 0s and 1s); positive bags have specific class-defining patterns injected that are absent in negative bags. The notebook shows that Hopfield, HopfieldPooling, and HopfieldLayer all learn to filter bags for the discriminative patterns with high accuracy, even as bag size and noise increase.

Latch Sequence Set

A long-term dependency task. A sequence begins with symbol A or B; after a variable delay, the model must output the corresponding symbol. The Hopfield layer concentrates attention sharply on the first position of the sequence, capturing the dependency without positional encoding.

Attention-based Deep MIL (MNIST Bags)

A canonical MIL benchmark from Ilse & Tomczak (2018). Each bag is a collection of 28x28 MNIST images; a bag is positive if it contains a target digit, negative otherwise. The notebook benchmarks Hopfield-based pooling against classic attention-MIL and demonstrates strong accuracy even with large bag sizes.


Running Experiments

All experiments are in src/experiments/ and write results to results/.

# Full ablation study (diffuse Q only / K only / both vs. none)
python -m src.experiments.ablation

# Benchmark diffusion modes (factored, simple, iterative, spectral)
python -m src.experiments.benchmark

# Noise robustness sweep
python -m src.experiments.noise_robustness

# Steps sweep (T = 1 ... 10)
python -m src.experiments.steps_sweep

# Mode comparison (standard Hopfield vs. DiffusedHopfield)
python -m src.experiments.mode_comparison

# Logit vs. feature-level diffusion comparison
python -m src.experiments.logit_vs_feature

# Attention head analysis
python -m src.experiments.attention_analysis

API Reference

All public names exported from difflayers:

Name Type Description
Hopfield nn.Module Base continuous Hopfield attention layer
HopfieldPooling nn.Module Hopfield-based pooling with a trainable query
HopfieldLayer nn.Module Trainable static-memory lookup layer
HopfieldCore nn.Module Low-level multi-head Hopfield kernel
DiffusedHopfield nn.Module DAHN: graph-diffusion augmented Hopfield
HopfieldEncoderLayer nn.Module Transformer encoder layer with Hopfield attention
HopfieldDecoderLayer nn.Module Transformer decoder layer with Hopfield attention
DiffusionOperator ABC Abstract base for diffusion strategies
SimpleDiffusion DiffusionOperator One-step explicit Euler diffusion
IterativeDiffusion DiffusionOperator T-step iterative diffusion
SpectralDiffusion DiffusionOperator Exact heat-kernel via eigendecomposition
FactoredDiffusion DiffusionOperator Laplacian-free O(kNd) factored form
apply_diffusion function Functional API for a single diffusion call
DiffusionConfig dataclass Unified serialisable config for DAHN
GraphCache class Builds and caches the kNN graph + Laplacian
DynamicsEngine class Orchestrates the diffuse->attend loop
EnergyTracker class Per-step Hopfield energy logging + early-stop
GraphBuilder class Fluent graph-construction API

Complexity Guide

Operation Time Memory Notes
Build similarity matrix O(N^2 d) O(N^2) build_similarity_matrix
Build kNN graph (dense) O(N^2) O(N^2) build_knn_graph
Build kNN graph (sparse) O(N^2) O(kN) as_sparse=True
Laplacian (dense) O(N^2) O(N^2)
FactoredDiffusion step O(kNd) O(kN) Recommended for large N
SimpleDiffusion step O(N^2 d) O(N^2)
IterativeDiffusion T steps O(T N^2 d) O(N^2)
SpectralDiffusion precompute O(N^3) O(N^2) Eigendecomposition
SpectralDiffusion apply O(N^2) O(N^2) Per forward pass
Dense Hopfield attention O(N^2 d) O(N^2) attention_mode="dense"
Graph-constrained attention O(kNd) O(kN) attention_mode="graph"
Full DAHN (factored + dense) O(T kNd + N^2 d) O(N^2) Typical configuration
Full DAHN (factored + graph) O(T kNd) O(kN) Fully sparse end-to-end

N = number of patterns, d = feature dimension, k = kNN degree, T = diffusion steps.


Background Paper

The Hopfield attention foundation is described in:

Hopfield Networks is All You Need Hubert Ramsauer, Bernhard Schaefl, Johannes Lehner, Philipp Seidl, Michael Widrich, Lukas Gruber, Markus Holzleitner, Milena Pavlovic, Geir Kjetil Sandve, Victor Greiff, David Kreil, Michael Kopp, Gunter Klambauer, Johannes Brandstetter, Sepp Hochreiter ICLR 2021arxiv.org/abs/2008.02217

A detailed companion blog post covering the theoretical background is available at ml-jku.github.io/hopfield-layers.


Disclaimer

Parts of this implementation are based on PyTorch v1.6.0 and extended for the Hopfield/DAHN setting:

Module Based on
difflayers/activation.pyHopfieldCore torch.nn.MultiheadAttention
difflayers/functional.pyhopfield_core_forward torch.nn.functional.multi_head_attention_forward
difflayers/transformer.pyHopfieldEncoderLayer torch.nn.TransformerEncoderLayer
difflayers/transformer.pyHopfieldDecoderLayer torch.nn.TransformerDecoderLayer

License

BSD-style license — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

difflayers-0.1.1.tar.gz (59.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

difflayers-0.1.1-py3-none-any.whl (54.6 kB view details)

Uploaded Python 3

File details

Details for the file difflayers-0.1.1.tar.gz.

File metadata

  • Download URL: difflayers-0.1.1.tar.gz
  • Upload date:
  • Size: 59.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for difflayers-0.1.1.tar.gz
Algorithm Hash digest
SHA256 36f940000ccc12bf73763eac41d582e2fff23349338adf77662b6c42b01750a1
MD5 cf99241ce9fb8b95e9a16871186af5e0
BLAKE2b-256 9e0a485a7050bac14d8d047647b0800be6a3dd336e8a1d9b69e4b0c94814d783

See more details on using hashes here.

File details

Details for the file difflayers-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: difflayers-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 54.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for difflayers-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a5b9e1a86a01cf73de53a76d5211b22161f165e97033ee6e60facbc7a0a96820
MD5 2e7be48bd71ae3ae9272abb079430a28
BLAKE2b-256 5ddfe496f6f030b71f63e4e9fd164ba504add427cd6c63e2977457b934506f1c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page