Dynamic Sparse Attention with Landmark Tokens — High-performance Triton implementation

These details have not been verified by PyPI

Project links

Project description

DSALT, Dynamic Sparse Attention with Landmark Tokens

See the repo here: REPO GITHUB, there you can see all the .md mentioned in this file See the full feature catalog in FEATURE.md.

A high-performance PyTorch library implementing DSALT (Dynamic Sparse Attention with Landmark Tokens), a sparse attention transformer library built for efficient training with Triton and PyTorch.

Published on PyPI: pip install dsalt

🚀 Key Features

Efficient Sparse Attention: Triton-accelerated kernels for GPU-optimized sparse causal self-attention
Dynamic Window Sizing: Adaptive local attention windows that grow with sequence position
Landmark Token Selection: Global landmark tokens selected via hybrid energy scoring
Mixed Precision Training: Full support for BF16/FP16 training with gradient scaling
Distributed Training: DDP (DistributedDataParallel) support for multi-GPU training
Production Ready: Complete training harness with checkpointing, logging, and validation

🛠️ Installation

Requirements

Python 3.8+
PyTorch 2.0+
CUDA 11.0+ (for GPU acceleration)
Triton 2.0+ (optional, for GPU kernels)

Install from PyPI

pip install dsalt

Install with Triton support

pip install dsalt[triton]

Install with Flash Attention fallback

pip install dsalt[flash-attn]

Install from source

git clone https://github.com/LeonardoCofone/dsalt-pytorch.git
cd dsalt-pytorch
pip install -e .

Developer setup

pip install -r requirements-dev.txt

🚀 Quick Start

import torch
from dsalt.model import DSALTLMHeadModel

# Create a DSALT language model
model = DSALTLMHeadModel(
    vocab_size=32000,
    d_model=1024,
    n_layers=24,
    n_heads=16,
    n_min=32,      # Minimum window size
    n_max=512,     # Maximum window size
    k_lmk=64,      # Number of landmark tokens
)

# Forward pass
input_ids = torch.randint(0, 32000, (1, 1024))
logits = model(input_ids)
print(f"Output shape: {logits.shape}")  # [1, 1024, 32000]

🏗️ Architecture

DSALT combines local causal windows with global landmark tokens:

Local Attention: Each token attends to a dynamic window of recent tokens
Landmark Selection: Top-k informative tokens selected globally via energy scoring
Sparse Computation: Only compute attention for relevant token pairs

Key Components

DSALTTransformer: Main transformer architecture
DSALTAttention: Multi-head sparse attention layer
WindowSizePredictor: Learned adaptive window sizing
HybridEnergyScorer: Landmark token selection
SparseAttentionKernel: Triton-accelerated attention computation

🎯 Training

Single GPU Training

from dsalt.training import DSALTTrainer
from torch.utils.data import DataLoader

trainer = DSALTTrainer(
    model=model,
    train_loader=train_dataloader,
    val_loader=val_dataloader,
    lr=3e-4,
    total_steps=100000,
    save_dir="checkpoints",
    dtype=torch.bfloat16,
)

trainer.train()

Multi-GPU Distributed Training

import torch.distributed as dist

# Initialize process group
dist.init_process_group(backend='nccl')

trainer = DSALTTrainer(
    model=model,
    train_loader=train_dataloader,
    val_loader=val_dataloader,
    ddp=True,  # Enable DDP
    # ... other args
)

📚 API Reference

Core Classes

DSALTLMHeadModel: Language model wrapper with LM head
DSALTTransformer: Base transformer architecture
DSALTAttention: Sparse attention module
DSALTTrainer: Training harness

Kernel Functions

dsalt_attention(): Main sparse attention function
compute_hybrid_energy_scores(): Landmark scoring
select_landmarks(): Landmark selection

🧪 Testing

Run the full test suite:

python tests/test.py

Run specific tests:

python tests/test_sparse_attn.py  # Attention kernels
python tests/test_dsalt_lm.py     # LM wrapper

📖 Citation

If you use DSALT in your research, please cite our paper:

@article{dsalt2024,
  title={Noise Accumulation and Rank Collapse in Dense Self-Attention: DSALT},
  author={Leonardo et al.},
  journal={Zenodo preprint},
  year={2026}
}

Paper: https://zenodo.org/records/19312827

🤝 Contributing

We welcome contributions! Please see our contributing guidelines.

📄 License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

🙏 Acknowledgments

Built on top of Triton for GPU kernels
Inspired by Flash Attention
Thanks to the PyTorch team for the excellent deep learning framework

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.53

May 13, 2026

0.2.52

May 13, 2026

0.2.51

May 13, 2026

0.2.50

May 13, 2026

0.2.49

May 13, 2026

0.2.48

May 13, 2026

0.2.47

May 13, 2026

0.2.46

May 13, 2026

0.2.45

May 13, 2026

0.2.44

May 12, 2026

0.2.43

May 12, 2026

0.2.42

May 12, 2026

0.2.41

May 12, 2026

0.2.40

May 12, 2026

0.2.39

May 12, 2026

0.2.38

May 12, 2026

0.2.37

May 12, 2026

0.2.36

May 12, 2026

0.2.35

May 11, 2026

0.2.34

May 11, 2026

0.2.33

May 11, 2026

0.2.32

May 11, 2026

0.2.31

May 11, 2026

0.2.30

May 11, 2026

0.2.29

May 11, 2026

0.2.28

May 11, 2026

0.2.27

May 11, 2026

0.2.26

May 11, 2026

0.2.25

May 11, 2026

0.2.24

May 11, 2026

0.2.23

May 11, 2026

0.2.22

May 11, 2026

0.2.21

May 11, 2026

0.2.20

May 11, 2026

0.2.19

May 11, 2026

0.2.18

May 11, 2026

0.2.17

May 11, 2026

0.2.16

May 11, 2026

0.2.15

May 11, 2026

0.2.14

May 10, 2026

0.2.13

May 10, 2026

0.2.12

May 10, 2026

0.2.11

May 10, 2026

0.2.10

May 10, 2026

0.2.9

May 10, 2026

0.2.8

May 8, 2026

0.2.7

May 8, 2026

0.2.6

May 8, 2026

0.2.5

May 8, 2026

0.2.4

May 8, 2026

0.2.3

May 8, 2026

0.2.2

May 7, 2026

0.2.1

May 4, 2026

0.2.0

May 4, 2026

0.1.20

May 4, 2026

0.1.19

May 4, 2026

0.1.18

May 4, 2026

0.1.17

May 4, 2026

0.1.16

May 4, 2026

0.1.15

May 4, 2026

0.1.14

May 4, 2026

0.1.12

May 3, 2026

0.1.11

May 3, 2026

0.1.10

May 3, 2026

0.1.9

May 3, 2026

0.1.8

May 2, 2026

This version

0.1.7

May 2, 2026

0.1.6

May 2, 2026

0.1.5

May 2, 2026

0.1.4

May 2, 2026

0.1.3

May 2, 2026

0.1.2

May 2, 2026

0.1.1

May 2, 2026

0.1.0

May 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dsalt-0.1.7.tar.gz (32.9 kB view details)

Uploaded May 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dsalt-0.1.7-py3-none-any.whl (26.0 kB view details)

Uploaded May 2, 2026 Python 3

File details

Details for the file dsalt-0.1.7.tar.gz.

File metadata

Download URL: dsalt-0.1.7.tar.gz
Upload date: May 2, 2026
Size: 32.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for dsalt-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`c093492b18dbea10970d95b567bb8f01714b44f22fc2f274324993738485b7cf`
MD5	`4f38f21241b283f73fc8ab54e10b23b4`
BLAKE2b-256	`8bda62e38e547c75694482f62042b5fe82a2061e26c689365ea59b0e6eabf589`

See more details on using hashes here.

File details

Details for the file dsalt-0.1.7-py3-none-any.whl.

File metadata

Download URL: dsalt-0.1.7-py3-none-any.whl
Upload date: May 2, 2026
Size: 26.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for dsalt-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a59c3a2324c784c719c88c306ade0240e3f0b2ec088d75108e4c3de0718c6633`
MD5	`720b0056e45055cb93dfdc33de12e833`
BLAKE2b-256	`88131f48a45456968f7c66becb6a0a8b590171ca6fb497170b1bef317861b032`

See more details on using hashes here.

dsalt 0.1.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DSALT, Dynamic Sparse Attention with Landmark Tokens

🚀 Key Features

📋 Table of Contents

🛠️ Installation

Requirements

Install from PyPI

Install with Triton support

Install with Flash Attention fallback

Install from source

Developer setup

🚀 Quick Start

🏗️ Architecture

Key Components

🎯 Training

Single GPU Training

Multi-GPU Distributed Training

📚 API Reference

Core Classes

Kernel Functions

🧪 Testing

📖 Citation

🤝 Contributing

📄 License

🙏 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes