Skip to main content

Stateful Coherent Language Models - Transformers with persistent memory

Project description

SCLM: Stateful Coherent Language Models

Version Python PyTorch License

SCLM is a PyTorch library for building language models with persistent latent state and multi-expert coherence mechanisms. Unlike standard transformers that process each sequence independently, SCLM maintains continuous memory across generation steps.

๐ŸŽฏ Key Features

Feature Description
Persistent State Maintains latent state across generation with variance < 10โปโท
Coherence Mechanism Multi-expert system that promotes consistent representations
Edit Mode Local modifications without global semantic drift
Drop-in Replacement Compatible with standard transformer training pipelines

๐Ÿ“Š Experimental Results

Metric Result
State Persistence variance < 10โปโท โœ…
Coherence Preservation 104.7% โœ…
Local Editing Drift 0.3% โœ…
Entity Preservation 100% โœ…

๐Ÿš€ Installation

pip install saclm

Or from source:

git clone https://github.com/Volgat/sclm.git
cd sclm
pip install -e .

๐Ÿ“– Quick Start

Basic Usage

from sclm import SCLM, SCLMConfig

# Create configuration
config = SCLMConfig(
    vocab_size=50257,
    n_layers=6,
    n_heads=8,
    d_model=512
)

# Create model
model = SCLM(config)

# Forward pass
import torch
input_ids = torch.randint(0, 50257, (1, 64))
output = model(input_ids)

logits = output['logits']  # [batch, seq_len, vocab_size]
metrics = output['global_metrics']  # coherence, alignment, etc.

Text Generation

# Generate text
prompt = torch.tensor([[1, 2, 3, 4, 5]])  # Your tokenized prompt
generated = model.generate(
    prompt,
    max_new_tokens=100,
    temperature=0.8,
    top_k=50
)

Edit Mode (Key Feature!)

# Process original text
original_ids = tokenizer.encode("The sword was blue.", return_tensors='pt')
model.reset_state()
_ = model(original_ids)

# Freeze state
model.freeze_state()

# Process edited text - coherence preserved!
edited_ids = tokenizer.encode("The sword was red.", return_tensors='pt')
output = model(edited_ids, edit_mode=True)

# Check coherence preservation
print(f"Coherence: {output['global_metrics']['coherence']}")

# Unfreeze when done
model.unfreeze_state()

๐Ÿ—๏ธ Architecture

SCLM introduces the EARCP Layer - a five-stage pipeline integrated into transformer blocks:

Input Hidden States
        โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  E - Encapsulation โ”‚  Create/update persistent state
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
          โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  A - Alignment    โ”‚  Measure hidden-state consistency
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
          โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  R - Revision     โ”‚  Correct semantic drift
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
          โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  C - Coherence    โ”‚  Multi-expert processing
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
          โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  P - Propagation  โ”‚  Inject state into deeper layers
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
          โ†“
    Output Hidden States

Components

Module Purpose
EncapsulationModule GRU-style state management
AlignmentModule Cross-attention consistency
RevisionModule Drift detection & correction
CoherenceModule Multi-expert ensemble
PropagationModule Layer-wise state injection

๐Ÿ“ Configuration Options

@dataclass
class SCLMConfig:
    # Model architecture
    vocab_size: int = 50257
    max_seq_length: int = 512
    n_layers: int = 6
    n_heads: int = 8
    d_model: int = 512
    d_ff: int = 2048
    dropout: float = 0.1
    
    # SCLM-specific
    latent_state_dim: int = 256    # State dimension
    n_coherence_heads: int = 4     # Coherence attention heads
    n_experts: int = 4             # Number of experts
    propagation_depth: int = 3     # Propagation adapters
    
    # EARCP parameters
    eta_s: float = 5.0             # Coherence sensitivity
    w_min: float = 0.05            # Minimum expert weight
    
    # Layer placement
    earcp_every_n_layers: int = 2  # EARCP every N layers
    use_global_earcp: bool = True  # Global EARCP layer

๐Ÿ”ง Pre-built Models

from sclm import create_sclm_small, create_sclm_medium, create_sclm_large

# ~45M parameters
model_small = create_sclm_small()

# ~125M parameters  
model_medium = create_sclm_medium()

# ~350M parameters
model_large = create_sclm_large()

๐Ÿ“Š Training Example

from sclm import SCLM, SCLMConfig
import torch
import torch.nn as nn

# Setup
config = SCLMConfig(vocab_size=50257)
model = SCLM(config).cuda()
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)

# Training loop
for batch in dataloader:
    input_ids, labels = batch
    input_ids, labels = input_ids.cuda(), labels.cuda()
    
    # Reset state for each sequence
    model.reset_state()
    
    # Forward
    output = model(input_ids, labels=labels)
    loss = output['loss']
    
    # Backward
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    # Log metrics
    metrics = output['global_metrics']
    print(f"Loss: {loss.item():.4f}, Coherence: {metrics['coherence']:.4f}")

๐Ÿงช Knowledge Distillation

from transformers import GPT2LMHeadModel

# Teacher model
teacher = GPT2LMHeadModel.from_pretrained('gpt2-large')
teacher.eval()

# Student (SCLM)
student = SCLM(config)

# Distillation training
T = 2.0  # Temperature
alpha = 0.5  # Distillation weight

for batch in dataloader:
    input_ids, labels = batch
    
    # Student forward
    student.reset_state()
    student_out = student(input_ids, labels)
    lm_loss = student_out['loss']
    
    # Teacher forward
    with torch.no_grad():
        teacher_logits = teacher(input_ids).logits
    
    # Distillation loss
    student_soft = F.log_softmax(student_out['logits'] / T, dim=-1)
    teacher_soft = F.softmax(teacher_logits / T, dim=-1)
    distill_loss = F.kl_div(student_soft, teacher_soft, reduction='batchmean') * T * T
    
    # Combined loss
    loss = (1 - alpha) * lm_loss + alpha * distill_loss
    loss.backward()

๐Ÿ“ˆ Metrics

Access detailed metrics after forward pass:

output = model(input_ids)

# Global EARCP metrics
global_metrics = output['global_metrics']
print(f"Coherence: {global_metrics['coherence']:.4f}")
print(f"Alignment: {global_metrics['alignment'].mean():.4f}")
print(f"Drift: {global_metrics['drift'].mean():.4f}")
print(f"State Norm: {global_metrics['state_norm']:.4f}")
print(f"Expert Weights: {global_metrics['weights']}")

# Per-block metrics
for i, block_metrics in enumerate(output['block_metrics']):
    print(f"Block {i}: coherence={block_metrics['coherence']:.4f}")

๐Ÿ”ฌ Research Applications

SCLM is designed for:

  • Long-form generation with consistent characters and facts
  • Document editing with local changes and global coherence
  • Multi-turn dialogue with persistent context
  • Story generation with entity tracking
  • Code generation with variable consistency

๐Ÿ“„ Citation

@article{amega2025sclm,
  title={SCLM: Stateful Coherent Language Models},
  author={Amega, Mike},
  journal={arXiv preprint},
  year={2025},
  note={github.com/Volgat/sclm}
}

๐Ÿ“œ License

Proprietary Community License - see LICENSE for details.

Community Use: Free for personal, research, and small business (< $100k revenue). Commercial Use: License required for larger entities and commercial SaaS products. See LICENSING.

๐Ÿš€ Deployment

To publish a new version to PyPI:

  1. Update version in setup.py.
  2. Create a new Release in GitHub.
  3. The GitHub Action will automatically build and publish the package.

Note: Requires PYPI_API_TOKEN secret in repository settings.

๐Ÿค Contributing

Contributions welcome! Please read our Contributing Guide.

๐Ÿ“ง Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

saclm-1.0.0.tar.gz (22.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

saclm-1.0.0-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file saclm-1.0.0.tar.gz.

File metadata

  • Download URL: saclm-1.0.0.tar.gz
  • Upload date:
  • Size: 22.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for saclm-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b788f17190f3a43378f99a3e834fd6cd134cc6f08ff47606a9876737574b4956
MD5 081dfe621fe8c50868765cb9ee9ddd9d
BLAKE2b-256 b07850b424b036223ede7ea8e4aae377675bbe718225dfec35559925109d90f5

See more details on using hashes here.

Provenance

The following attestation bundles were made for saclm-1.0.0.tar.gz:

Publisher: publish.yml on Volgat/sclm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file saclm-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: saclm-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 13.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for saclm-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 81e6a65d75cba5710f9db84cade2f2d29f77dd0e621166b40711ce5d0a1724a9
MD5 64935ed2e54383c84e08527facf69642
BLAKE2b-256 742713b9b565afcee6a476bbae79655c21e8bd8895096958480418688a3e11d5

See more details on using hashes here.

Provenance

The following attestation bundles were made for saclm-1.0.0-py3-none-any.whl:

Publisher: publish.yml on Volgat/sclm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page