Lossless token sequence compression for large language models

These details have not been verified by PyPI

Project description

Delta

Lossless Token Sequence Compression for Large Language Models

Delta is a production-ready lossless compression system that reduces the computational and economic cost of LLM inference by eliminating redundancy in input sequences before they reach the model. It replaces repeated multi-token patterns with compact meta-token references backed by a learnable dictionary format, achieving 30-60% compression on structured inputs while guaranteeing perfect reconstruction.

About

Triage

Triage is building the resiliency layer for AI systems. We expose deep observability into staging environments so coding agents can statically reason about codebases, predict runtime behavior, and remediate production issues autonomously. On the research side, we're filling latent space in frozen models to address known system weaknesses, building dynamic guardrails for real-time threat detection, and designing bureaucratic authorization flows that route sensitive operations to the right humans at the right time. The thesis behind everything we build: security should learn from every failure and compound over time, not reset with each new signature. We're compressing MTTR toward zero by treating every incident as training data for the next.

Why We Built Delta

Delta emerged from a pattern we kept seeing while building Triage and talking to teams deploying LLM-powered products: context-augmented generation produces massively redundant token sequences, and inference costs scale linearly with every one of them. The same boilerplate, the same structural patterns, the same retrieved chunks appear repeatedly across requests. LTSC (Lossless Token Sequence Compression) identifies and compresses these recurring patterns without semantic loss, targeting the specific redundancy profiles that coding agents, retrieval systems, and multi-turn conversations create. We built it as an open-source contribution because inference cost remains one of the most underappreciated bottlenecks to AI adoption, particularly for the agentic workflows where context windows balloon quickly. Solving it at the compression layer lets teams ship more capable agents without burning through API budgets or making architectural compromises to stay under token limits.

Contributors

This project was constructed by:

Nikhil Srivastava (University of California, Berkeley)
Omansh Bainsla (Georgia Tech)
Sahil Chatiwala (Georgia Tech)

Why Delta?

As context augmentation techniques become standard practice (retrieval-augmented generation, tool schemas, code repositories, policy documents, multi-turn conversations), input sequences increasingly contain repeated subsequences that consume context window budget and quadratic attention compute without contributing new information.

Delta addresses this by:

Compressing redundant patterns at the token level before inference
Maintaining a learnable dictionary format that models can understand with minimal fine-tuning
Guaranteeing lossless round-trip reconstruction

Key Features

Lossless Compression: Perfect round-trip reconstruction guaranteed via mathematical constraints
High Performance: Rust/WASM core with O(n log n) suffix array algorithms
Cross-Platform: Python library + TypeScript SDK for browsers, Node.js, Deno, and edge runtimes
Production Ready: Comprehensive test suites, type safety, structured logging
Multiple Discovery Strategies: Suffix array, BPE-style iterative, AST-aware (Python)
Optimal Selection: Greedy, weighted interval scheduling, beam search, or ILP solvers
ML Integration: Importance scoring, region-aware compression, quality prediction
Streaming Support: Handle arbitrarily large inputs with constant memory

Installation

Python

pip install delta-ltsc

# With optional dependencies
pip install "delta-ltsc[analysis]"   # ML analysis tools
pip install "delta-ltsc[training]"   # Fine-tuning utilities
pip install "delta-ltsc[mcp]"        # MCP server for AI assistants
pip install "delta-ltsc[all]"        # Everything

TypeScript/JavaScript

npm install @delta-ltsc/sdk

# Optional ML features
npm install @delta-ltsc/ml

Quick Start

Python

from delta import compress, decompress, CompressionConfig

# Compress a token sequence
tokens = [101, 2054, 2003, 1996, 4248, 102] * 20  # Repeated pattern
config = CompressionConfig(verify=True)
result = compress(tokens, config)

print(f"Original: {result.original_length} tokens")
print(f"Compressed: {result.compressed_length} tokens")
print(f"Ratio: {result.compressed_length / result.original_length:.1%}")

# Decompress (lossless)
restored = decompress(result.serialized_tokens, config)
assert restored == tokens

TypeScript

import { compress, decompress, initWasm } from '@delta-ltsc/sdk';

// Initialize WASM (required once)
await initWasm();

// Compress tokens
const tokens = [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3];
const result = await compress(tokens);

console.log(`Compressed: ${result.originalLength} → ${result.compressedLength} tokens`);
console.log(`Savings: ${((1 - result.compressionRatio) * 100).toFixed(1)}%`);

// Decompress (lossless)
const restored = await decompress(result.serializedTokens);

How It Works

Delta identifies repeated token subsequences and replaces them with meta-tokens, storing the mapping in a prefix dictionary:

Original:  [the, cat, sat, on, the, mat, the, cat, ran]
                ^^^^^^^^            ^^^^^^^^
Compressed: [<Dict>, <MT_0>, <Len:2>, the, cat, </Dict>, <MT_0>, sat, on, the, mat, <MT_0>, ran]

The compression format is designed to be learnable by transformer models with minimal fine-tuning.

Compressibility Constraint

A pattern is only compressed if it provides net savings:

length × count > 1 + length + count + overhead

This mathematical constraint ensures compression never increases sequence length.

Configuration

Python

from delta import CompressionConfig

config = CompressionConfig(
    # Pattern discovery
    min_subsequence_length=2,
    max_subsequence_length=8,
    discovery_mode="suffix-array",  # "sliding-window", "bpe"
    
    # Selection algorithm
    selection_mode="greedy",        # "optimal", "beam", "ilp"
    beam_width=8,
    
    # Hierarchical compression
    hierarchical_enabled=True,
    hierarchical_max_depth=3,
    
    # ML features (optional)
    use_importance_scoring=False,
    enable_adaptive_regions=False,
    
    # Verification
    verify=True,
)

TypeScript

const result = await compress(tokens, {
  minSubsequenceLength: 2,
  maxSubsequenceLength: 8,
  selectionMode: 'greedy',      // 'optimal' | 'beam'
  hierarchicalEnabled: true,
  hierarchicalMaxDepth: 3,
  verify: true,
});

Selection Algorithms

Mode	Complexity	Description
`greedy`	O(n log n)	Fast, savings-density heuristic
`optimal`	O(n²)	Weighted interval scheduling via DP
`beam`	O(n × width)	Beam search with marginal savings
`ilp`	Exponential	Globally optimal (requires scipy)

Advanced Features

Static Dictionaries (TypeScript)

Use pre-built dictionaries for domain-specific content:

const result = await compress(pythonCodeTokens, {
  staticDictionary: 'python-v1',
});

Available: python-v1, typescript-v1, markdown-v1, json-v1, sql-v1

Streaming Compression

import { createStreamingCompressor } from '@delta-ltsc/sdk';

const compressor = await createStreamingCompressor();
for await (const chunk of tokenStream) {
  await compressor.addChunk(chunk);
}
const result = await compressor.finish();

Region-Aware Compression

from delta import detect_regions, filter_candidates_by_region

# Detect semantic regions (SYSTEM, USER, CONTEXT, CODE)
regions = detect_regions(tokens)

# Apply per-region compression limits
filtered = filter_candidates_by_region(candidates, regions, tokens)

Quality Prediction

from delta import create_predictor

predictor = create_predictor(task_type="code")
prediction = predictor.predict(tokens, result)

if prediction.recommendation == "compress":
    # Safe to use compressed output
    pass

MCP Integration (AI Assistants)

Delta provides an MCP server for integration with AI coding assistants:

# Install with MCP support
pip install "delta-ltsc[mcp]"

# Run the server
delta-mcp

Configure in Cursor/Claude Desktop (~/.cursor/mcp.json). Prefer absolute paths (GUI apps often don't inherit your shell PATH):

{
  "mcpServers": {
    "delta-ltsc": {
      "command": "/path/to/venv/bin/python",
      "args": ["-m", "delta.mcp"]
    }
  }
}

If you see spawn delta-mcp ENOENT, use an absolute path to your environment's python (or delta-mcp).

See MCP Documentation for a full setup guide and available tools.

Architecture

delta/                          # Python package
├── compressor.py               # Core compress/decompress API
├── config.py                   # Configuration dataclass
├── engine.py                   # Compression pipeline
├── discovery.py                # Pattern discovery algorithms
├── selection.py                # Pattern selection algorithms
├── adaptive.py                 # Region-aware compression
└── quality_predictor.py        # ML quality prediction

packages/
├── core/                       # Rust/WASM compression core
│   └── src/
│       ├── lib.rs              # WASM exports
│       ├── suffix_array.rs     # O(n log n) suffix array
│       ├── selection.rs        # Pattern selection
│       └── dictionary.rs       # Dictionary serialization
├── sdk/                        # TypeScript SDK
│   └── src/
│       ├── compress.ts         # High-level API
│       ├── streaming.ts        # Streaming support
│       └── worker.ts           # Worker thread support
└── ml/                         # Optional ML features
    └── src/
        ├── importance.ts       # Pattern importance scoring
        ├── quality.ts          # Quality prediction
        └── regions.ts          # Region detection

Benchmarks

# Python benchmarks
python benchmarks/ratio.py --tokens 8192 --runs 10
python benchmarks/latency.py --tokens 8192 --runs 10

# TypeScript benchmarks
cd packages/sdk && npm run benchmark

Typical results on structured inputs:

Compression ratio: 35-60% reduction
Latency: < 10ms for 8K tokens (WASM), < 50ms (Python)
Memory: O(n) with streaming support

Testing

Python

pytest                              # Run all tests
pytest --cov=delta --cov-report=html  # With coverage

TypeScript

cd packages/sdk
npm test                            # Unit tests
npm run test:browser                # Browser tests

Rust

cd packages/core
cargo test                          # Unit tests

Documentation

Design Intent - Motivation and objectives
Compression Format - Dictionary and body format specification
Architecture - System design overview
Algorithm Details - Discovery and selection algorithms
ML Integration - Importance scoring and quality prediction
API Reference - Complete API documentation
TypeScript SDK Guide - Getting started with the SDK
MCP Server - Integration with AI coding assistants

Citation

If you use Delta in your research, please cite:

@software{delta2026,
  title={Delta: Lossless Token Sequence Compression for Large Language Models},
  author={{Triage Sec}},
  year={2026},
  url={https://github.com/delta-ltsc/delta}
}

This work builds on foundational research in lossless token compression:

@article{harvill2024lossless,
  title={Lossless Token Sequence Compression via Meta-Tokens},
  author={Harvill, John and others},
  year={2024}
}

License

MIT License. See LICENSE for details.

Contributing

Contributions are welcome. Please ensure:

All tests pass (pytest for Python, npm test for TypeScript)
Code is formatted (ruff format for Python, prettier for TypeScript)
Type hints are complete (mypy delta/ for Python)
New features include tests and documentation

Acknowledgments

The foundational LTSC algorithm from Harvill et al. (2024)
The open-source community for feedback and contributions

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.1

Jan 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

delta_ltsc-0.3.1.tar.gz (137.1 kB view details)

Uploaded Jan 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

delta_ltsc-0.3.1-py3-none-any.whl (124.8 kB view details)

Uploaded Jan 31, 2026 Python 3

File details

Details for the file delta_ltsc-0.3.1.tar.gz.

File metadata

Download URL: delta_ltsc-0.3.1.tar.gz
Upload date: Jan 31, 2026
Size: 137.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for delta_ltsc-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`4c81a254cfc4afc2f2db4cce641d6aeaac0c5b56525a81f25a93135685c0f70d`
MD5	`3009e7a3398133c95e8c1b52830aad47`
BLAKE2b-256	`20a6b7cf36c18e0cbc5ff9f5c356507dbb8b442b2e7d8add90a8497ef0bfd62a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for delta_ltsc-0.3.1.tar.gz:

Publisher: python-package.yml on Triage-Sec/delta

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: delta_ltsc-0.3.1.tar.gz
- Subject digest: 4c81a254cfc4afc2f2db4cce641d6aeaac0c5b56525a81f25a93135685c0f70d
- Sigstore transparency entry: 879864380
- Sigstore integration time: Jan 31, 2026
Source repository:
- Permalink: Triage-Sec/delta@57b3fc5157a3ecd50d50ad33b7fae10de15a7079
- Branch / Tag: refs/tags/v0.3.1
- Owner: https://github.com/Triage-Sec
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-package.yml@57b3fc5157a3ecd50d50ad33b7fae10de15a7079
- Trigger Event: release

File details

Details for the file delta_ltsc-0.3.1-py3-none-any.whl.

File metadata

Download URL: delta_ltsc-0.3.1-py3-none-any.whl
Upload date: Jan 31, 2026
Size: 124.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for delta_ltsc-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4477a5b6179f86382b60c4bfd20324f8604d1901b62c3100fd23ff47f0bf3e87`
MD5	`3e3c947b1f276659a593803551af365e`
BLAKE2b-256	`5893ba2a48944b2e649587556006074fa67531f5f2011e116d0a6242e223a9fd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for delta_ltsc-0.3.1-py3-none-any.whl:

Publisher: python-package.yml on Triage-Sec/delta

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: delta_ltsc-0.3.1-py3-none-any.whl
- Subject digest: 4477a5b6179f86382b60c4bfd20324f8604d1901b62c3100fd23ff47f0bf3e87
- Sigstore transparency entry: 879864477
- Sigstore integration time: Jan 31, 2026
Source repository:
- Permalink: Triage-Sec/delta@57b3fc5157a3ecd50d50ad33b7fae10de15a7079
- Branch / Tag: refs/tags/v0.3.1
- Owner: https://github.com/Triage-Sec
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-package.yml@57b3fc5157a3ecd50d50ad33b7fae10de15a7079
- Trigger Event: release

delta-ltsc 0.3.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Delta

About

Triage

Why We Built Delta

Contributors

Why Delta?

Key Features

Installation

Python

TypeScript/JavaScript

Quick Start

Python

TypeScript

How It Works

Compressibility Constraint

Configuration

Python

TypeScript

Selection Algorithms

Advanced Features

Static Dictionaries (TypeScript)

Streaming Compression

Region-Aware Compression

Quality Prediction

MCP Integration (AI Assistants)

Architecture

Benchmarks

Testing

Python

TypeScript

Rust

Documentation

Citation

License

Contributing

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance