Modular transformer blocks built in PyTorch

These details have not been verified by PyPI

Project links

Project description

Stackformer logo

Stackformer

Stackformer is a modular PyTorch library for building, training, and extending Transformer architectures for language and vision tasks.

Stackformer is currently in an early development stage. The goal is to provide a clean, reusable, and developer-friendly foundation for industrial prototyping and research experimentation.

Why Stackformer?

If you work on Transformers often, you need three things: reusable modules, understandable code, and flexibility to experiment quickly.

Stackformer focuses on exactly that:

Modular design: core building blocks are separated and easy to reuse.
Research-friendly: quickly try model variants and custom ideas.
Engineering-ready: typed package, practical trainer, and clear project structure.
Multi-domain: includes both NLP-oriented and vision-oriented transformer models.

Repository structure (for developers)

Stackformer/
├── stackformer/
│   ├── __init__.py
│   ├── generate.py
│   ├── trainer.py
│   ├── modules/
│   │   ├── Attention.py
│   │   ├── Feed_forward.py
│   │   ├── Normalization.py
│   │   └── position_embedding.py
│   ├── models/
│   │   ├── OpenAI.py
│   │   ├── Meta.py
│   │   ├── Google.py
│   │   └── Transformer.py
│   └── vision/
│       ├── vit.py
│       └── segformer.py
├── tests/
├── assets/
├── pyproject.toml
└── README.md

Features

1) Core Transformer modules

Attention variants (Multi_Head_Attention, Multi_query_Attention, Group_query_Attention, RoPE variants, KV cache helpers)
Feed-forward blocks (FF_ReLU, FF_GELU, FF_SwiGLU, and more)
Normalization layers (LayerNormalization, RMSNormalization)
Positional embeddings (absolute, sinusoidal, RoPE)

2) Model implementations

OpenAI-style: GPT family (GPT_1, GPT_2)
Meta-style: LLaMA family (llama_1, llama_2)
Google-style: Gemma family (gemma_1_2b, gemma_1_7b)
General transformer: baseline transformer

3) Vision

Vision Transformer (ViT)
SegFormer (SegFormerB0)

4) Utilities

Text generation helper (text_generate)
Training utility (Trainer) with optimizer/scheduler options, evaluation, checkpoints, and resume support

Installation guide

A) Standard install

pip install stackformer

B) With training extras

pip install "stackformer[train]"

C) For development and tests

pip install "stackformer[dev]"

D) Recommended virtual environment setup

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -U pip
pip install stackformer

E) Conda setup

conda create -n stackformer python=3.10 -y
conda activate stackformer
pip install stackformer

Getting started

1) Quick model example (GPT-2 style)

import torch
from stackformer.models.OpenAI import GPT_2

model = GPT_2(
    vocab_size=128,
    num_layers=2,
    embed_dim=32,
    num_heads=4,
    seq_len=16,
    dropout=0.0,
    hidden_dim=64,
)

x = torch.randint(0, 128, (2, 16))
logits = model(x)
print(logits.shape)  # [2, 16, 128]

2) Using modules directly (attention + feed-forward)

import torch
from stackformer.modules.Attention import Multi_Head_Attention
from stackformer.modules.Feed_forward import FF_GELU

x = torch.randn(2, 16, 64)
attn = Multi_Head_Attention(embed_dim=64, num_heads=4, dropout=0.1)
ff = FF_GELU(embed_dim=64, hidden_dim=128, dropout=0.1)

h = attn(x)
out = ff(h)
print(out.shape)

3) Trainer usage (minimal workflow)

from torch.utils.data import TensorDataset
import torch
from stackformer.models.OpenAI import GPT_2
from stackformer.trainer import Trainer

# Dummy tokenized dataset
inputs = torch.randint(0, 128, (256, 16))
targets = torch.randint(0, 128, (256, 16))
train_ds = TensorDataset(inputs, targets)
eval_ds = TensorDataset(inputs[:64], targets[:64])

model = GPT_2(
    vocab_size=128,
    num_layers=2,
    embed_dim=32,
    num_heads=4,
    seq_len=16,
    dropout=0.0,
    hidden_dim=64,
)

trainer = Trainer(
    model=model,
    train_dataset=train_ds,
    eval_dataset=eval_ds,
    train_batch_size=8,
    eval_batch_size=8,
    vocab_size=128,
    output_dir="./outputs",
    num_epoch=1,
    lr=3e-4,
)

trainer.train()

4) Creating a custom model with Stackformer blocks

import torch
import torch.nn as nn
from stackformer.modules.Attention import Multi_Head_Attention
from stackformer.modules.Feed_forward import FF_SwiGLU
from stackformer.modules.Normalization import RMSNormalization

class TinyCustomTransformer(nn.Module):
    def __init__(self, embed_dim=64, num_heads=4, hidden_dim=128):
        super().__init__()
        self.norm1 = RMSNormalization(embed_dim)
        self.attn = Multi_Head_Attention(embed_dim=embed_dim, num_heads=num_heads, dropout=0.1)
        self.norm2 = RMSNormalization(embed_dim)
        self.ff = FF_SwiGLU(embed_dim=embed_dim, hidden_dim=hidden_dim, dropout=0.1)

    def forward(self, x):
        x = x + self.attn(self.norm1(x))
        x = x + self.ff(self.norm2(x))
        return x

model = TinyCustomTransformer()
y = model(torch.randn(2, 16, 64))
print(y.shape)

Communication & community

Issues (bugs/feature requests): https://github.com/Gurumurthy30/Stackformer/issues
Discussions (Q&A, ideas): https://github.com/Gurumurthy30/Stackformer/discussions
Releases: https://github.com/Gurumurthy30/Stackformer/releases
Maintainer GitHub: https://github.com/Gurumurthy30

If you are using Stackformer in industry or research, open a discussion and share feedback—use-cases help shape the roadmap.

Roadmap

Planned improvements include:

Optimize existing model implementations for better speed and stability.
Add more transformer-based architectures.
Improve training optimization workflows.
Add monitoring features for training visibility and experiment tracking.

License

MIT License. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.9

Mar 12, 2026

0.1.8

Mar 9, 2026

This version

0.1.7

Feb 24, 2026

0.1.6

Sep 29, 2025

0.1.5

Aug 7, 2025

0.1.4

Aug 4, 2025

0.1.3

Jul 29, 2025

0.1.2

Jul 26, 2025

0.1.1

Jul 24, 2025

0.1.0

Jul 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stackformer-0.1.7.tar.gz (34.3 kB view details)

Uploaded Feb 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

stackformer-0.1.7-py3-none-any.whl (39.8 kB view details)

Uploaded Feb 24, 2026 Python 3

File details

Details for the file stackformer-0.1.7.tar.gz.

File metadata

Download URL: stackformer-0.1.7.tar.gz
Upload date: Feb 24, 2026
Size: 34.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for stackformer-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`eb65045833f76e663adaf853d7b54da2c4e5216f3c856d6da53c8677b3f6f98b`
MD5	`823cdd7fbe6f305fa10d1fea40fc1fb0`
BLAKE2b-256	`cacf6021c095f594f72250b2f8445f2f916bc4dec6790667bf94db4873aa28e1`

See more details on using hashes here.

File details

Details for the file stackformer-0.1.7-py3-none-any.whl.

File metadata

Download URL: stackformer-0.1.7-py3-none-any.whl
Upload date: Feb 24, 2026
Size: 39.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for stackformer-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cdd776aca477aed05f6d2390f35ef782cea9454a1facc9edb9704b8139b0d50c`
MD5	`37c7734593882929100c4606d5f3ea3a`
BLAKE2b-256	`b1d4ac7065fa16bb04f86f90d337e2df958a056ac44ab77c805ef267ea2ad845`

See more details on using hashes here.

stackformer 0.1.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Stackformer

Why Stackformer?

Repository structure (for developers)

Features

1) Core Transformer modules

2) Model implementations

3) Vision

4) Utilities

Installation guide

A) Standard install

B) With training extras

C) For development and tests

D) Recommended virtual environment setup

E) Conda setup

Getting started

1) Quick model example (GPT-2 style)

2) Using modules directly (attention + feed-forward)

3) Trainer usage (minimal workflow)

4) Creating a custom model with Stackformer blocks

Communication & community

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes