Skip to main content

Steerling: An interpretable causal diffusion language model with concept steering

Project description

Steerling

An interpretable causal diffusion language model.

Steerling-8B combines masked diffusion language modeling with concept decomposition, enabling:

  • Generation: Non-autoregressive text generation via confidence-based unmasking
  • Attribution: Decompose predictions into known concept contributions
  • Steering: Intervene on concept activations to control generation
  • Embeddings: Extract hidden, composed, known, or unknown representations

Quick Start

pip install steerling
from steerling import SteerlingGenerator, GenerationConfig

generator = SteerlingGenerator.from_pretrained("guidelabs/steerling-8b")

text = generator.generate(
    "The key to understanding neural networks is",
    GenerationConfig(max_new_tokens=100, seed=42),
)
print(text)

Model Details

Property Value
Parameters ~8B
Architecture CausalDiffusionLM + Interpretable Concept Head
Context Length 4096
Vocabulary 100,281 (cl100k_base + specials)
Known Concepts 33,732
Unknown Concepts 101,196
GQA 32 heads, 4 KV heads
Precision bfloat16
License Apache 2.0

Architecture

Steerling uses block-causal attention (bidirectional within 64-token blocks, causal across blocks) with masked diffusion training. At inference, tokens are generated by iteratively unmasking positions in order of model confidence. The interpretable concept heads decompose transformer hidden states h into:

h → known_features + unk_hat + epsilon = composed → lm_head → logits
  • known_features: Weighted sum of top-k learned concept embeddings
  • unk_hat: Residual features captured by a factorized unknown head
  • epsilon: Small correction term for reconstruction fidelity

Installation

# From PyPI
pip install steerling

# From source
git clone https://github.com/guidelabs/steerling.git
cd steerling
pip install -e ".[dev]"

# With evaluation support
pip install -e ".[all]"

FAQ

  • Where can I read more about the details of this architecture?
    You can read more about the architecture in these blog posts: Scaling Interpretable Models with 8B Parameters and Causal Diffusion Language Models. We will be releasing a more detailed technical report in a few months.

  • This is a base model, what about an instruction-tuned model?
    Stay tuned.

  • Is training code available?
    This release is inference-only. Training code is not included. If you're interested in training or fine-tuning, please reach out to Guide Labs.

  • What dataset did you train on?
    We trained on an augmented version of the Nemontron-cc-hq data for a total of about 1.3 Trillion tokens.

  • What is block-causal attention?
    Standard causal attention only lets each token attend to previous tokens. Block-causal attention groups tokens into blocks of say 64 and allows bidirectional attention within each block, while maintaining causal ordering across blocks. This gives the model local bidirectional context while preserving the ability to generate sequentially. Refer to this post: Causal Diffusion Language Models, for more details.

  • What are "known" and "unknown" concepts?
    The model decomposes its internal representations into two parts:

    • Known concepts (33,732): learned and supervised features that correspond to identifiable patterns that a human will understand.
    • Unknown concepts (101,196): capture the signal that known concepts don't explain in the hidden representations.
    • Together they reconstruct the full hidden state with an error: hidden ≈ known_features + unknown_features + epsilon.
  • How do I find concept IDs for steering?
    The concept metadata is in concepts/complete_concept_info.csv (shipped with the HuggingFace model). Each row maps a concept ID to its description. Use positive values to amplify a concept and negative values to suppress it:

    config = GenerationConfig(steer_known={concept_id: 2.0})   # amplify
    config = GenerationConfig(steer_known={concept_id: -1.0})  # suppress
    
  • What GPU do I need?
    Steerling-8B in bfloat16 requires approximately 18GB VRAM. It fits on a single H100, A100 (40GB or 80GB), A6000 (48GB), or RTX 4090 (24GB). It does not fit on consumer GPUs with 16GB or less.

  • Can I fine-tune this model?
    Yes. However, we have not included finetuning code with this package. Steerling is an inference-only release; if there is increasing request, we will support fine-tuning in a future release.

  • What tokenizer does Steerling-8B use?
    Steerling uses OpenAI's cl100k_base tokenizer (via tiktoken) with 4 additional special tokens: <|pad|>, <|bos|>, <|endofchunk|>, and <|mask|>, for a total vocabulary of 100,281 tokens.

  • Can I use this with the Hugging Face transformers library?
    Not directly, Steerling uses a custom architecture (block-causal attention, concept heads) that isn't in the transformers library. Use the steerling package instead, which provides SteerlingGenerator.from_pretrained() with a similar interface.

  • How do I get training data attributions?
    This release is a light-weight version of the pipeline, so it doesn't directly support training data attribution. We have provided notebooks to enable concept, and feature attributions. If you're interested in supporting training data attribution, please reach out to Guide Labs.

License

The Steerling source code is released under the Apache License 2.0.

The model weights are provided for research and evaluation purposes. The weights were trained on datasets with varying license terms, including Nemotron-CC-HQ and Dolmino Mix. Some training data includes synthetic content generated by third-party models with their own license terms. We are currently reviewing the implications of these upstream licenses for downstream use of the model weights. Please check back for updates on the weight licensing terms.

For questions about commercial use of the model weights, contact us at info@guidelabs.ai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

steerling-0.1.0.tar.gz (213.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

steerling-0.1.0-py3-none-any.whl (42.1 kB view details)

Uploaded Python 3

File details

Details for the file steerling-0.1.0.tar.gz.

File metadata

  • Download URL: steerling-0.1.0.tar.gz
  • Upload date:
  • Size: 213.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.2

File hashes

Hashes for steerling-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0c23ca766473d896566679d1d994556a496b02ef274bebd0a989aada50073f62
MD5 d22d07a8428e71360b20e39fa2f50a22
BLAKE2b-256 a2e93465bd63b13a853c08602b603b20f7fa29e73c37259f2c2b6f65afd538ed

See more details on using hashes here.

File details

Details for the file steerling-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: steerling-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 42.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.2

File hashes

Hashes for steerling-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2d25389d432e6286da92ddafc416e5d067ed9f954c85099ca4980f92902fd6da
MD5 abdadcdd75ad5f4ffe8fc70f0c334256
BLAKE2b-256 627f2d97c0a406c96c60f0d502f898222e87acce9f3721548385f1a374ef8beb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page