Skip to main content

Modular transformer blocks built in PyTorch

Project description

Stackformer logo

image image image image image

Stackformer

Stackformer is a modular PyTorch library for building, training, and extending Transformer architectures for language and vision tasks.

Stackformer is currently in an early development stage. The goal is to provide a clean, reusable, and developer-friendly foundation for industrial prototyping and research experimentation.


Why Stackformer?

If you work on Transformers often, you need three things: reusable modules, understandable code, and flexibility to experiment quickly.

Stackformer focuses on exactly that:

  • Modular design: core building blocks are separated and easy to reuse.
  • Research-friendly: quickly try model variants and custom ideas.
  • Engineering-ready: typed package, practical trainer, and clear project structure.
  • Multi-domain: includes both NLP-oriented and vision-oriented transformer models.

Repository structure (for developers)

Stackformer/
├── stackformer/
│   ├── __init__.py
│   ├── generate.py
│   ├── trainer.py
│   ├── modules/
│   │   ├── Attention.py
│   │   ├── Feed_forward.py
│   │   ├── Normalization.py
│   │   └── position_embedding.py
│   ├── models/
│   │   ├── OpenAI.py
│   │   ├── Meta.py
│   │   ├── Google.py
│   │   └── Transformer.py
│   └── vision/
│       ├── vit.py
│       └── segformer.py
├── tests/
├── assets/
├── pyproject.toml
└── README.md

Features

1) Core Transformer modules

  • Attention variants (Multi_Head_Attention, Multi_query_Attention, Group_query_Attention, RoPE variants, KV cache helpers)
  • Feed-forward blocks (FF_ReLU, FF_GELU, FF_SwiGLU, and more)
  • Normalization layers (LayerNormalization, RMSNormalization)
  • Positional embeddings (absolute, sinusoidal, RoPE)

2) Model implementations

  • OpenAI-style: GPT family (GPT_1, GPT_2)
  • Meta-style: LLaMA family (llama_1, llama_2)
  • Google-style: Gemma family (gemma_1_2b, gemma_1_7b)
  • General transformer: baseline transformer

3) Vision

  • Vision Transformer (ViT)
  • SegFormer (SegFormerB0)

4) Utilities

  • Text generation helper (text_generate)
  • Training utility (Trainer) with optimizer/scheduler options, evaluation, checkpoints, and resume support

Installation guide

A) Standard install

pip install stackformer

B) With training extras

pip install "stackformer[train]"

C) For development and tests

pip install "stackformer[dev]"

D) Recommended virtual environment setup

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -U pip
pip install stackformer

E) Conda setup

conda create -n stackformer python=3.10 -y
conda activate stackformer
pip install stackformer

Getting started

1) Quick model example (GPT-2 style)

import torch
from stackformer.models.OpenAI import GPT_2

model = GPT_2(
    vocab_size=128,
    num_layers=2,
    embed_dim=32,
    num_heads=4,
    seq_len=16,
    dropout=0.0,
    hidden_dim=64,
)

x = torch.randint(0, 128, (2, 16))
logits = model(x)
print(logits.shape)  # [2, 16, 128]

2) Using modules directly (attention + feed-forward)

import torch
from stackformer.modules.Attention import Multi_Head_Attention
from stackformer.modules.Feed_forward import FF_GELU

x = torch.randn(2, 16, 64)
attn = Multi_Head_Attention(embed_dim=64, num_heads=4, dropout=0.1)
ff = FF_GELU(embed_dim=64, hidden_dim=128, dropout=0.1)

h = attn(x)
out = ff(h)
print(out.shape)

3) Trainer usage (minimal workflow)

from torch.utils.data import TensorDataset
import torch
from stackformer.models.OpenAI import GPT_2
from stackformer.trainer import Trainer

# Dummy tokenized dataset
inputs = torch.randint(0, 128, (256, 16))
targets = torch.randint(0, 128, (256, 16))
train_ds = TensorDataset(inputs, targets)
eval_ds = TensorDataset(inputs[:64], targets[:64])

model = GPT_2(
    vocab_size=128,
    num_layers=2,
    embed_dim=32,
    num_heads=4,
    seq_len=16,
    dropout=0.0,
    hidden_dim=64,
)

trainer = Trainer(
    model=model,
    train_dataset=train_ds,
    eval_dataset=eval_ds,
    train_batch_size=8,
    eval_batch_size=8,
    vocab_size=128,
    output_dir="./outputs",
    num_epoch=1,
    lr=3e-4,
)

trainer.train()

4) Creating a custom model with Stackformer blocks

import torch
import torch.nn as nn
from stackformer.modules.Attention import Multi_Head_Attention
from stackformer.modules.Feed_forward import FF_SwiGLU
from stackformer.modules.Normalization import RMSNormalization

class TinyCustomTransformer(nn.Module):
    def __init__(self, embed_dim=64, num_heads=4, hidden_dim=128):
        super().__init__()
        self.norm1 = RMSNormalization(embed_dim)
        self.attn = Multi_Head_Attention(embed_dim=embed_dim, num_heads=num_heads, dropout=0.1)
        self.norm2 = RMSNormalization(embed_dim)
        self.ff = FF_SwiGLU(embed_dim=embed_dim, hidden_dim=hidden_dim, dropout=0.1)

    def forward(self, x):
        x = x + self.attn(self.norm1(x))
        x = x + self.ff(self.norm2(x))
        return x

model = TinyCustomTransformer()
y = model(torch.randn(2, 16, 64))
print(y.shape)

Communication & community

If you are using Stackformer in industry or research, open a discussion and share feedback—use-cases help shape the roadmap.


Roadmap

Planned improvements include:

  • Optimize existing model implementations for better speed and stability.
  • Add more transformer-based architectures.
  • Improve training optimization workflows.
  • Add monitoring features for training visibility and experiment tracking.

License

MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stackformer-0.1.7.tar.gz (34.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stackformer-0.1.7-py3-none-any.whl (39.8 kB view details)

Uploaded Python 3

File details

Details for the file stackformer-0.1.7.tar.gz.

File metadata

  • Download URL: stackformer-0.1.7.tar.gz
  • Upload date:
  • Size: 34.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for stackformer-0.1.7.tar.gz
Algorithm Hash digest
SHA256 eb65045833f76e663adaf853d7b54da2c4e5216f3c856d6da53c8677b3f6f98b
MD5 823cdd7fbe6f305fa10d1fea40fc1fb0
BLAKE2b-256 cacf6021c095f594f72250b2f8445f2f916bc4dec6790667bf94db4873aa28e1

See more details on using hashes here.

File details

Details for the file stackformer-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: stackformer-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 39.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for stackformer-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 cdd776aca477aed05f6d2390f35ef782cea9454a1facc9edb9704b8139b0d50c
MD5 37c7734593882929100c4606d5f3ea3a
BLAKE2b-256 b1d4ac7065fa16bb04f86f90d337e2df958a056ac44ab77c805ef267ea2ad845

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page