Skip to main content

Q-Compass architecture — RL-grounded sequence mixing for text, vision, audio, and world modeling

Project description

Quatrix — Q-Compass Architecture

"Where transformers retrieve by similarity, Quatrix navigates by value."

Quatrix is a novel neural architecture that replaces standard multi-head attention with Q-Compass — a sequence mixing mechanism grounded in reinforcement learning theory rather than geometric similarity.

Built by Syed Abdur Rehman Ali (@Abd0r).

Paper: Q-Compass: Grounding Sequence Mixing in Reinforcement Learning Navigation — Zenodo, March 2026.


Core Idea: Q-Compass

Standard attention computes:

Attention(Q, K, V) = softmax(QK^T / sqrt(d_k)) @ V

Four projections (W_Q, W_K, W_V, W_O). Similarity-based routing — attends to what looks similar, retrieves a projected transform of it.

Q-Compass computes:

state  = x @ W_s          # "Where am I?"
action = x @ W_a          # "Where can I go?"
Q(s,a) = softmax(state @ action.T / sqrt(r))
output = W_o(Q(s,a) @ x)  # gather from x directly — no W_V

Three projections (W_s, W_a, W_o). Value-based routing — asks "in state s, how valuable is attending to position a?"

The key removal: No W_V. Content is gathered directly from x, unchanged. All routing intelligence lives in Q(s,a). This forces the model to learn precise navigation rather than compensating for imprecise attention with a learned content transform.

At H=512, r=64: standard attention uses 1,048,576 parameters per layer. Q-Compass uses 327,680 — a 69% reduction in attention-block parameters.

The same block — with or without a causal mask — handles both autoregressive text generation (Q-Compass) and bidirectional image encoding (Q-Compass-Bi). One mechanism, all modalities.


Architecture

QuatrixLM (language model)
├── Token + Positional Embeddings
├── N × QuatrixBlock
│   ├── LayerNorm → QCompass (causal) → residual
│   └── LayerNorm → FFN (GELU) → residual
├── LayerNorm
└── Output Head (tied to embeddings)

QuatrixVision (image encoder)
├── Conv2d patch embedding (16×16 patches → 196 patches per 224×224 image)
├── Positional embeddings
├── M × QCompassBi blocks (bidirectional, no causal mask)
├── LayerNorm
└── Linear projection → LM hidden dim

QuatrixAudio (audio encoder)
├── Mel-spectrogram patch embedding (16×16 freq×time patches)
├── Positional embeddings
├── 3 × QCompassBi blocks (bidirectional, no causal mask)
├── LayerNorm
└── Linear projection → LM hidden dim
  (audio tokens prepended to text tokens, same as vision)

QuatrixWorld (world model plugin — wraps QuatrixLM)
├── StateEncoder: QCompassBi aggregates token sequence → state vector
├── ActionHead: predicts action distribution from state
├── TransitionModel: 4 × QCompassBi blocks, predicts s' = f(s, a)
└── RewardHead (optional): estimates scalar value for RL fine-tuning

Modality Support

Modality Module Status
Text QuatrixLM Production
Vision QuatrixVision Production
Audio QuatrixAudio Production
World Model QuatrixWorld Production

Quick Start

pip install quatrix
from quatrix import QuatrixLM, QuatrixConfig
import torch

# Text only
cfg = QuatrixConfig(vocab_size=50257, hidden_size=512, num_layers=7,
                    max_seq_len=5120, q_rank=64)
model = QuatrixLM(cfg)  # ~44M params
input_ids = torch.randint(0, 50257, (1, 10))
out = model(input_ids)
logits = out['logits']  # [B, L, vocab_size]

# Text + Vision
cfg = QuatrixConfig(vocab_size=50257, hidden_size=512, num_layers=7,
                    max_seq_len=5120, q_rank=64, use_vision=True)
model = QuatrixLM(cfg)  # ~50M params
pixel_values = torch.randn(1, 3, 224, 224)
out = model(input_ids, pixel_values=pixel_values)

# Text + Vision + Audio
cfg = QuatrixConfig(vocab_size=50257, hidden_size=512, num_layers=7,
                    max_seq_len=5120, q_rank=64, use_vision=True, use_audio=True)
model = QuatrixLM(cfg)
mel = torch.randn(1, 1, 80, 3000)  # [B, 1, n_mels, time_frames]
out = model(input_ids, pixel_values=pixel_values, mel=mel)

# World Model
from quatrix import WorldModel
cfg = QuatrixConfig(vocab_size=50257, hidden_size=512, num_layers=7,
                    max_seq_len=5120, q_rank=64, use_world_model=True)
model = QuatrixLM(cfg)
world = WorldModel(lm_hidden=512, action_dim=256)
hidden_states = model.get_hidden_states(input_ids)         # [B, L, H]
state, action_logits, next_state, reward = world(hidden_states)

Built-in training script

# Quick demo — downloads TinyShakespeare, trains on CPU/GPU
python -m quatrix.train

# Custom config
python -m quatrix.train --steps 2000 --hidden 512 --layers 7  # Berry-Q0 size
python -m quatrix.train --data myfile.txt                      # your own text

Berry-Q0 — First Quatrix Model

Berry-Q0 is the first model trained on the Quatrix architecture.

Property Value
Architecture QuatrixLM + QuatrixVision
Parameters ~50M (44M LM + 5.5M Vision + 0.4M projection)
Context 5120 tokens
Modalities Text + Image
Training hardware Single RTX 4050 6GB laptop GPU
Text data ~3.2M samples (web, math, code, reasoning, instruction, alignment)
Image data ~550K image-text pairs (VQAv2, GQA, TextVQA, DocVQA, ScienceQA, CLEVR)
Status GRPO reasoning training in progress

Trained from scratch in three stages: pretraining on ~3.2M mixed text + image samples, supervised finetuning on instruction and reasoning data, and ongoing GRPO reasoning training (R1-style, math domain). Empirical results will be reported in a follow-up paper once training is complete.


Roadmap

Model Modalities Status
Berry-Q0 Text + Vision GRPO training in progress
Berry-Q1 Text + Vision + Audio + World Model Future work

Paper

If you use Quatrix or Q-Compass in your work, please cite:

Syed Abdur Rehman Ali. Q-Compass: Grounding Sequence Mixing in
Reinforcement Learning Navigation. Zenodo, March 2026.
https://zenodo.org/records/19104202

Author

Syed Abdur Rehman Ali

GitHub HuggingFace X


License

OpenRAIL-M — open use with behavioral restrictions (no military use, no mass surveillance). See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quatrix-0.1.3.tar.gz (17.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quatrix-0.1.3-py3-none-any.whl (18.7 kB view details)

Uploaded Python 3

File details

Details for the file quatrix-0.1.3.tar.gz.

File metadata

  • Download URL: quatrix-0.1.3.tar.gz
  • Upload date:
  • Size: 17.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for quatrix-0.1.3.tar.gz
Algorithm Hash digest
SHA256 05be4d9be301af9f708244059401d462b34bd2b844e912a2ac963ac618e56392
MD5 cc619f3204d9da07d1e5ffc5c39bb2f4
BLAKE2b-256 12181ab1cb6741633b7b1ac67fdff384f74b59491d882c8881262dcf89a8b563

See more details on using hashes here.

File details

Details for the file quatrix-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: quatrix-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 18.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for quatrix-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 099dcbf0e31adaa2632132ef1abb225e6b7fd79a6de582e0ff252bd3295186d2
MD5 f58bb909ffd03af4d09b3023508fa637
BLAKE2b-256 9e740098c4fd53c1f052f6978aab98bd6ef991b966ade3fb35fc6e542fddb77f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page