Skip to main content

Genome-based context compression for local LLMs

Project description

๐Ÿงฌ Helix Context

License: Apache 2.0 Python 3.11+ PyPI version SIKE: 10/10 Compression: 769x Paper: Agentome

Genome-based context compression for local LLMs. Scale-Invariant Knowledge Engine (SIKE) โ€” 10/10 retrieval from 0.6B to 26B parameters.

Treats context like a genome instead of a flat text buffer. A 7,200-gene SQLite database (44MB raw knowledge) compresses to ~15K tokens of expressed context per turn โ€” a 769x inference compression ratio. Retrieval is perfectly scale-invariant: the same genome delivers 10/10 needle accuracy to qwen3:0.6b and Claude Opus alike. The Librarian does the work; the Reader just extracts.

๐Ÿ“– Quick glossary โ€” If the biological metaphor is new to you: gene = one knowledge chunk (content + metadata) ยท genome = the full SQLite store ยท ribosome = small model that packs/ranks/splices context ยท promoter = retrieval tags ยท expression = selecting + formatting genes for one query ยท chromatin = gene accessibility tier (open / euchromatin / heterochromatin) ยท replication = packing conversations back into the genome.

๐Ÿ“‘ Table of Contents
  Client (Continue, Cursor, any OpenAI client)
         |
         v
  +--------------------------+
  |  Helix Proxy (FastAPI)   |  Port 11437
  |  /v1/chat/completions    |  OpenAI-compatible
  |                          |
  |  1. Extract query        |
  |  2. Express pipeline     |  <-- Genome (SQLite)
  |  3. Inject context       |  <-- Ribosome (CPU model)
  |  4. Forward to Ollama    |  --> localhost:11434
  |  5. Stream tee response  |
  |  6. Background replicate |
  +--------------------------+

Instead of stuffing your entire codebase into the prompt, Helix compresses it into a persistent SQLite genome and expresses only the relevant genes per turn. The model sees compressed context, not raw text. Conversations replicate back into the genome automatically, building institutional memory over time.

Benchmark Highlights

๐ŸŽฏ 10/10 needle retrieval from 0.6B to 26B parameters (43x range) ๐Ÿš€ 769x inference compression (11.6M-token genome โ†’ 15K expressed per turn) ๐Ÿ’Ž Claude Haiku + Helix matches Opus โ€” all three API tiers hit 10/10 accuracy ๐Ÿง  Local 4B model beats blind Opus 2.25x on domain-specific extraction

Test Corpus Composition

The benchmark genome is a real developer's working data, not a curated eval set. 65.8% of the corpus is pure noise โ€” game data, subtitles, blueprints โ€” and Helix still hits 10/10 on project-specific needles hidden in the remaining 34%.

Source Category Genes Tokens % Repo Visibility
๐ŸŽฎ Steam / game data (Hades subtitles, BeamNG configs, Dyson Sphere blueprints, Factorio saves) 2,905 ~7.7M 65.8% โ€”
๐ŸŒ SwiftWing21/BigEd โ€” BigEd fleet (Education dir) 2,405 ~1.8M 15.4% public (private worktree ahead by 2 commits)
๐Ÿ”’ CosmicTasha/CosmicTasha 944 ~1.6M 13.9% private
๐Ÿ”’ Project Tally (private financial ledger โ€” repo URL withheld) 242 ~0.2M 2.0% private
๐ŸŒ SwiftWing21/helix-context โ€” this repo 161 ~0.1M 1.2% public
๐ŸŒ SwiftWing21/scorerift โ€” ScoreRift / two-brain-audit 110 ~0.1M 0.7% public
Unclassified / session memory 497 ~0.1M 1.0% โ€”
Total 7,264 ~11.6M 100%

Source breakdown (software only, excluding game noise):

  • ๐ŸŒ Public GitHub repos: ~2.0M tokens (50.0%) โ€” BigEd, helix-context, scorerift
  • ๐Ÿ”’ Private GitHub repos: ~1.8M tokens (45.6%) โ€” CosmicTasha, BookKeeper
  • ๐Ÿ”„ Unclassified / session memory: ~0.2M tokens (4.4%)

Signal-to-noise: Only ~33% of the 11.6M-token corpus is relevant software knowledge. The other ~66% is game data the Agentome had to learn to ignore via chromatin state (HETEROCHROMATIN tier) and promoter-tag discrimination. The 10/10 retrieval holds despite the noise โ€” arguably because of it, since real-world retrieval systems have to survive mixed-domain corpora.

๐Ÿ’ก How this table was measured: Claude (co-authoring this repo) had workspace access to the user's local project directories during the benchmark session, including private repos that never leave the machine. The genome file itself is gitignored โ€” only aggregate counts and the benchmark queries are public. This demonstrates a real use case for Helix: your proprietary code participates in retrieval without being uploaded anywhere. Even the Education directory is split โ€” the bulk lives in the public BigEd repo, with a private worktree ahead by 2 unreleased commits.

Database Storage Breakdown (post-VACUUM)

The on-disk genome.db is 523 MB for 7,264 genes (~46 MB of raw content). Why the ~12x gap between raw content and DB file? Because the genome isn't just storage โ€” it's a 4-tier retrieval engine (promoter tags โ†’ FTS5 โ†’ SPLADE โ†’ ฮฃฤ’MA semantic), and each tier carries its own index.

Component Size % of DB Purpose
FTS5 posting lists (genes_fts_data) 187.3 MB 35.8% Full-text inverted index for keyword retrieval
Raw content (gene.content) 44.5 MB 8.5% Original source text, verbatim
SPLADE sparse index (splade_terms) 35.7 MB 6.8% 1.73M term weights for lexical expansion
Ribosome complements (gene.complement) 16.5 MB 3.2% Small-model compressed summaries (2.69x storage ratio)
Gene relations (NLI) 6.6 MB 1.3% 108K typed logical relations between genes
Entity graph 5.6 MB 1.1% 117K entity-to-gene edges for co-activation
Promoter index (retrieval tags) 3.8 MB 0.7% 73,815 domain/entity tags across all genes
Codons + metadata JSON 8.2 MB 1.6% Semantic tags, promoter JSON, epigenetics
ฮฃฤ’MA embeddings (20D vectors) 0.34 MB 0.1% Semantic primes โ€” 80 bytes per gene
Key-value facts (pre-extracted) 1.4 MB 0.3% Pre-parsed key=value pairs for answer slate
Accounted payload subtotal 310.0 MB 59.3% Actual data across all indexes
SQLite B-tree + page overhead 212.7 MB 40.7% Index structure, not fragmentation
Total file size 522.7 MB 100%

๐Ÿ’พ VACUUM impact: This table reflects post-VACUUM state. Before VACUUM, the database was 752 MB โ€” the extra 229 MB (30.4%) was free pages from thinning 11,529 genes down to 7,264 during tuning. SQLite holds deleted pages until a VACUUM reclaims them. The ~213 MB of "B-tree overhead" that remains is structural: page headers, cell pointers, interior nodes of the index B-trees. That's not reclaimable without changing the indexing strategy.

Observations:

  • FTS5 dominates storage (35.8% of the file). The full-text index holds position data for every token across all 7K genes โ€” it's what enables the sub-5ms content queries that make the ~1s total retrieval latency possible.
  • Raw content is only 8.5% of the file. The rest is indexes. This is the expected tradeoff for a retrieval-optimized database vs a flat text archive.
  • Accounted payload is 310 MB (59.3%). The remaining 213 MB (40.7%) is legitimate B-tree structure overhead โ€” page headers, cell pointers, and internal index nodes. SQLite can't compress this further without sacrificing query speed.
  • ฮฃฤ’MA embeddings are essentially free โ€” 20 floats per gene = 80 bytes. A 1M-gene genome would cost only 80 MB for the semantic tier.
  • Inference cost is unchanged by DB size: the LLM only ever sees ~15K tokens per turn regardless of whether the genome is 50 MB or 50 GB.

Compression summary:

Metric Ratio Meaning
Storage (raw โ†’ complement) 2.69x How much the ribosome compresses each gene's summary
Expression (full corpus โ†’ single turn) 776x How much of the genome the LLM sees per query
DB file / raw content 11.76x (post-VACUUM) Index overhead for 4-tier retrieval
DB file / raw content 16.90x (pre-VACUUM) With fragmentation from thinning
vs 128K-stuffed context 8.5x fewer tokens Baseline "dump everything" approach
vs chunked RAG (25K tokens) 1.7x fewer tokens Standard vector-search RAG

The headline number โ€” 776x inference compression โ€” is what matters for cost and latency. Everything else is a bookkeeping detail of how the Librarian files its books.

Needle-in-a-haystack on this 7,264-gene genome (~46MB raw knowledge):

Model Params VRAM Retrieval Accuracy
qwen3:0.6b 0.6B 0.5 GB 10/10 2/10
qwen3:1.7b 1.7B 1.4 GB 10/10 3/10
qwen3:4b 4B 2.5 GB 10/10 9/10
gemma4:e4b (MoE) 8B / 4B active 9.6 GB 10/10 9/10
qwen3:8b 8B 5.2 GB 10/10 9/10
gemma4:26b-a4b (MoE + DDR4 offload) 26B / 4B active 8 GB + 13 GB RAM 10/10 6/10
Claude Haiku + Helix โ€” API 10/10 10/10
Claude Sonnet + Helix โ€” API 10/10 10/10
Claude Opus + Helix โ€” API 10/10 10/10

Without Helix, the same Claude models score 3-4/10 (hand-curated reference only). The genome is a universal uplift: identical gains at every price tier and parameter count. See docs/RESEARCH.md for the full SIKE analysis.

Quick Start

# Install from PyPI (beta)
pip install helix-context --pre

# Pull a small model for the ribosome (context codec)
ollama pull gemma4:e2b

# Start the proxy
helix
# or: python -m uvicorn helix_context.server:app --host 127.0.0.1 --port 11437

# Seed the genome with your own project files
python examples/seed_genome.py path/to/your/project/

# Check genome health
curl http://127.0.0.1:11437/stats

Point any OpenAI-compatible client at http://127.0.0.1:11437/v1 and start chatting. Context compression happens transparently.

What You'll See

After seeding the genome, /stats shows the state of your knowledge base:

$ curl -s http://127.0.0.1:11437/stats | jq
{
  "total_genes": 7264,
  "open": 7264,
  "compression_ratio": 2.69,
  "health": {
    "total_queries": 503,
    "avg_ellipticity": 0.62,
    "status_counts": {"aligned": 143, "sparse": 267, "denatured": 93}
  }
}

A /context query returns the expressed context window โ€” exactly what gets injected into the downstream LLM:

$ curl -s http://127.0.0.1:11437/context \
    -H "Content-Type: application/json" \
    -d '{"query":"What port does the Helix proxy listen on?"}' | jq '.[0]'
{
  "name": "Helix Genome Context",
  "description": "12 genes expressed, 3.1x compression, health=aligned (ฮ”ฮต=0.66)",
  "content": "<expressed_context>\n<GENE src=\"helix-context/README.md\" facts=\"port=11437\">\n# Helix Context\n...",
  "context_health": {
    "ellipticity": 0.66,
    "coverage": 0.85,
    "density": 0.42,
    "freshness": 1.0,
    "genes_expressed": 12,
    "status": "aligned"
  }
}

A chat request through the proxy gets the context injected automatically โ€” your client doesn't need to know Helix exists:

$ curl -s http://127.0.0.1:11437/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "qwen3:4b",
      "messages": [{"role":"user","content":"What port does the Helix proxy use?"}]
    }' | jq -r '.choices[0].message.content'

The Helix proxy server listens on **port 11437**, as specified in helix.toml
under [server]. This is configured in the repository at helix-context/README.md.

The model answered from the retrieved genes, not its training data โ€” which doesn't contain your project.

How It Works

6-step expression pipeline per turn:

Step What Cost Blocking?
1. Extract Heuristic keyword extraction from query 0 tokens No
2. Express SQLite promoter lookup + synonym expansion + co-activation 0 tokens No
3. Re-rank Small CPU model scores candidates by relevance ~300 tokens Yes
4. Splice Small CPU model trims introns, keeps exons (batched) ~600 tokens Yes
5. Assemble Join spliced parts, enforce token budget, wrap in tags 0 tokens No
6. Replicate Pack query+response exchange back into genome ~300 tokens No (background)

Token budget:

  • 3k tokens: ribosome decoder prompt (fixed, tells the big model how to read codons)
  • 12k tokens: expressed context (dense XML gene format, 12 genes per turn)
  • 11M+ tokens: genome cold storage (SQLite, ~46MB raw on a mature project)

Compression metrics:

  • Storage: 2.7x (raw content โ†’ ribosome complements)
  • Expression: 769x (full genome โ†’ what the LLM sees per turn)
  • vs naive RAG at 25K tokens: 1.7x fewer tokens, 10/10 vs ~6/10 accuracy

Key Features

Context Health Monitor (Delta-Epsilon)

Every query computes a health signal measuring how well the genome served it:

{
  "context_health": {
    "ellipticity": 0.82,
    "coverage": 0.75,
    "density": 0.68,
    "freshness": 1.0,
    "genes_expressed": 3,
    "genes_available": 42,
    "status": "aligned"
  }
}
Status Ellipticity Meaning
aligned >= 0.7 Genome is well-grounded, model is informed
sparse >= 0.3 Gaps exist, model may guess on some topics
stale any Expressed genes are outdated (low freshness)
denatured < 0.3 Context is unreliable, high hallucination risk

Horizontal Gene Transfer (HGT)

Export a genome and import it into another Helix instance:

# Export
python examples/hgt_transfer.py export -d "Project knowledge snapshot"

# Preview what an import would change
python examples/hgt_transfer.py diff genome_export.helix

# Import into another instance
python examples/hgt_transfer.py import genome_export.helix

Three merge strategies: skip_existing (safe default), overwrite, newest. Content-addressed gene IDs ensure deduplication across instances.

Associative Memory

Genes that are frequently expressed together build co-activation links. When you query for topic A, the genome also pulls in topic B if they've been co-expressed before. This creates an organic associative memory that grows smarter over time.

Tissue-Specific Expression (MoE + Small Models)

MoE models (Gemma 4) and sub-3.2B models can't reliably "look back" across a 15K context window. Helix auto-detects these architectures and switches to a tissue-specific expression mode inspired by how cell types selectively express genes from the same genome:

  1. Answer slate โ€” pre-extracted key=value facts front-loaded in the first ~200 tokens, inside every sliding-window attention layer (Gemma 4's 5:1 SWA ratio means 5 of 6 layers only see 1,024-token windows).
  2. Relevance-first gene ordering โ€” highest-scoring gene at position 0, not sorted by source sequence. Guarantees the best match lands inside every attention window.
  3. Think suppression โ€” /no_think injection + temp=0 for small models that otherwise waste their output budget on reasoning loops.

Measured impact on gemma4:e4b:

Mode Retrieval Accuracy
Standard expression 10/10 5/10
MoE tissue expression 10/10 9/10

Dense models (qwen3 family) automatically use the standard expression path and are unaffected. Detection is per-request based on the downstream model name, so the same server can handle mixed clients.

Synonym Expansion

Configure lightweight query expansion in helix.toml:

[synonyms]
cache = ["redis", "ttl", "invalidation", "cdn"]
auth = ["jwt", "login", "security", "token"]

When a user asks about "cache", the genome also searches for "redis", "ttl", etc.

HTTP Endpoints

Core endpoints

Endpoint Method Description
/v1/chat/completions POST OpenAI-compatible proxy (primary integration)
/ingest POST Ingest content into genome: {content, content_type, metadata?}
/context POST Query genome for context: {query} (Continue format)
/consolidate POST Distill session buffer into knowledge genes
/stats GET Genome metrics, compression ratio, health
/health GET Server status, ribosome model, gene count
/health/history GET Recent query health signals (?limit=N)

Admin / maintenance endpoints

Endpoint Method Description
/admin/refresh POST Reopen the genome connection to see external writes
/admin/vacuum POST Reclaim free SQLite pages after thinning (returns before/after size)
/admin/kv-backfill POST Run CPU regex KV extraction on genes missing key_values
/replicas GET List replica status (sync lag, paths)
/replicas/sync POST Force-sync all replicas from the master genome
/bridge/status GET Shared-memory bridge status (inbox, signals)
/bridge/collect POST Ingest pending files from the shared bridge inbox
/bridge/signal POST Write a named signal to the shared bridge

Four operations that sound similar โ€” but do different things

These are the most confused operations in the admin surface. Know which one to reach for:

Operation What it does When to use
checkpoint(mode) Flush WAL log into the main DB file. No file size change. During/after bulk ingest, to guarantee data is durable before a crash. Automatic every 50 inserts.
refresh() / /admin/refresh Close and reopen the long-lived DB connection so it picks up writes made by external processes. After running a thinning script, ingest worker, or any out-of-band write. Cheap, non-destructive.
compact() Scan every gene's source_id, mtime-check the file, mark source-changed genes as AGING. Does not delete or shrink anything. Periodic source-staleness detection (runs automatically every compact_interval seconds).
vacuum() / /admin/vacuum Rewrite the SQLite file to reclaim free pages from previous deletions. Shrinks the file. After large thinning operations. Blocking โ€” run during maintenance windows only. Our 7.2K-gene genome reclaimed 229 MB (30%) on first VACUUM.

Rule of thumb:

  • If you care about durability โ†’ checkpoint()
  • If you care about visibility (seeing external writes) โ†’ refresh()
  • If you care about staleness (detecting changed sources) โ†’ compact()
  • If you care about disk space โ†’ vacuum()

Continue IDE Integration

Add to ~/.continue/config.yaml:

models:
  - name: Helix (Local)
    provider: openai
    model: gemma4:e4b
    apiBase: http://127.0.0.1:11437/v1
    apiKey: EMPTY
    roles: [chat]
    defaultCompletionOptions:
      contextLength: 128000
      maxTokens: 4096

Use Chat mode (not Agent mode). Set contextLength high so Continue sends the full message; Helix handles compression downstream.

Python API

from helix_context import HelixContextManager, load_config

config = load_config()
helix = HelixContextManager(config)

# Ingest content
helix.ingest("Your document text here", content_type="text")
helix.ingest(open("src/main.py").read(), content_type="code")

# Build context for a query
window = helix.build_context("How does auth work?")
print(window.expressed_context)
print(window.context_health.status)  # "aligned" / "sparse" / "denatured"

# Learn from an exchange
helix.learn("How does auth work?", "JWT middleware validates tokens...")

# Export genome
from helix_context.hgt import export_genome
export_genome(helix.genome, "project.helix", description="Auth system knowledge")

ScoreRift Integration

Helix includes a bridge to ScoreRift for divergence-based context health monitoring:

from helix_context.integrations.scorerift import GenomeHealthProbe, cd_signal

# Probe genome health
probe = GenomeHealthProbe("http://127.0.0.1:11437")
report = probe.full_scan()

# Register as ScoreRift dimensions
from helix_context.integrations.scorerift import make_genome_dimensions
engine.register_many(make_genome_dimensions())

# Feed divergence resolutions back into the genome
from helix_context.integrations.scorerift import resolution_to_gene
resolution_to_gene("security", auto_score=0.85, manual_score=1.0,
                   resolution="False positives in auth module scanner rules")

Configuration

All config in helix.toml:

[ribosome]
model = "gemma4:e4b"        # context codec for pack/re_rank/splice
backend = "ollama"          # or "deberta" for faster CPU-only ribosome
timeout = 30                # seconds before fallback
keep_alive = "30m"          # keep model loaded (eliminates swap latency)
warmup = true               # pre-load model on server start

[budget]
ribosome_tokens = 3000
expression_tokens = 12000   # 15K total per turn (decoder + expression)
max_genes_per_turn = 12
splice_aggressiveness = 0.3
decoder_mode = "condensed"  # full | condensed | minimal | none

[genome]
path = "genome.db"
cold_start_threshold = 10
replicas = ["C:/helix-cache/genome.db", "E:/helix-cache/genome.db"]
replica_sync_interval = 100

[ingestion]
backend = "cpu"             # "cpu" (spaCy+regex, fast) | "ollama" (LLM, slow)
splade_enabled = true       # SPLADE sparse expansion at index time
entity_graph = true         # entity-based co-activation links

[server]
host = "127.0.0.1"
port = 11437
upstream = "http://localhost:11434"

[synonyms]
cache = ["redis", "ttl", "invalidation", "cdn"]
auth = ["jwt", "login", "security", "token"]

Environment variables:

  • OLLAMA_KV_CACHE_TYPE=q4_0 โ€” INT4 KV cache quantization (recommended). q8_0 tested but produced WORSE accuracy (gave models more room to hallucinate in think mode). q4_0 is faster, more accurate, and uses less VRAM.
  • HELIX_CONFIG=/path/to/helix.toml โ€” override config file location

Testing

# Mock tests only (no Ollama needed, ~8s)
pytest tests/ -m "not live"

# Live tests (requires Ollama)
pytest tests/ -m live -v -s

# Full suite
pytest tests/ -v

Benchmarks

# Needle-in-a-haystack (single model)
HELIX_MODEL=qwen3:4b python benchmarks/bench_needle.py

# Full sweep across all local models
python benchmarks/bench_sweep.py

See docs/RESEARCH.md for full SIKE analysis and results across 7 local models + 3 Claude API tiers.

Architecture

Module Role
schemas.py Gene, ContextWindow, ContextHealth, ChromatinState
codons.py CodonChunker (text/code splitting) + CodonEncoder (serialization)
genome.py SQLite genome with promoter-tag retrieval + co-activation
ribosome.py Small-model codec: pack, re_rank, splice, replicate
context_manager.py 6-step pipeline orchestrator + pending replication buffer
server.py FastAPI proxy + standalone endpoints
config.py TOML config loader with synonym map
hgt.py Genome export/import (Horizontal Gene Transfer)
integrations/scorerift.py CD spectroscope bridge to ScoreRift

Origin

Built as a standalone package extracted from BigEd CC. Implements the "Ribosome Hypothesis" for local LLM context management.

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

helix_context-0.2.0b2.tar.gz (26.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

helix_context-0.2.0b2-py3-none-any.whl (108.5 kB view details)

Uploaded Python 3

File details

Details for the file helix_context-0.2.0b2.tar.gz.

File metadata

  • Download URL: helix_context-0.2.0b2.tar.gz
  • Upload date:
  • Size: 26.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for helix_context-0.2.0b2.tar.gz
Algorithm Hash digest
SHA256 c711f2c58387c523e919ab4e24bc3bc1e793590bf964e45d5b6ae1c85214f4e3
MD5 b4817267c13dad932becfc561526d978
BLAKE2b-256 881638764501e72c78c115204b1dfcae6f8a37743145ef422b66b912e2e521ba

See more details on using hashes here.

Provenance

The following attestation bundles were made for helix_context-0.2.0b2.tar.gz:

Publisher: publish.yml on SwiftWing21/helix-context

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file helix_context-0.2.0b2-py3-none-any.whl.

File metadata

  • Download URL: helix_context-0.2.0b2-py3-none-any.whl
  • Upload date:
  • Size: 108.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for helix_context-0.2.0b2-py3-none-any.whl
Algorithm Hash digest
SHA256 62f8ed5356f6849013f4ced4493ff0d732544cbc3e85fc1e98aa69016feb9590
MD5 61116d5d288cd87d9320a3ffda5be3d9
BLAKE2b-256 ee5efb099044c2d5df2a537adf766b647f3b5555e2bcb0f0f7672412dd81440e

See more details on using hashes here.

Provenance

The following attestation bundles were made for helix_context-0.2.0b2-py3-none-any.whl:

Publisher: publish.yml on SwiftWing21/helix-context

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page