Modular transformer blocks built in PyTorch
Project description
🧱 Stackformer
Stackformer is a modular transformer-building framework written entirely in PyTorch. It is designed primarily for experimentation, providing various transformer blocks such as attention mechanisms, normalization layers, feed-forward networks, and a simple model architecture. The project is a work-in-progress with plans for further enhancements and expansions.
📖 About Me
My name is Gurumurthy, and I am a final-year Bachelor of Engineering student from India. I created this library as my own size project to showcase my skills and knowledge in deep learning and transformer architectures.
I am also interested and free to work with others on different projects for knowledge sharing and building connections.
🌟 Features
- Multiple attention mechanisms including multi-head, group query, linear, local, and KV cache variants
- Token embedding via
tiktoken - Absolute and sinusoidal positional embeddings
- Normalization layers like LayerNorm and RMSNorm
- Several feed-forward network variants with activations such as ReLU, GELU, SiLU, LeakyReLU, and Sigmoid
- A simple GPT-style transformer model implementation
📁 Project Structure
stackformer/
|-- modules/
| |-- tokenizer.py # Token embedding using tiktoken
| |-- position_embedding.py # Absolute and sinusoidal embeddings
| |-- Attention.py # Attention mechanisms
| |-- Normalization.py # LayerNorm and RMSNorm
| |-- Feed_forward.py # Feed-forward layers with various activations
|-- models/
| -- GPT_2.py # GPT-style transformer stack model
-- trainer.py # Training loop and utilities \
💻 Installation
Clone the repository and install in development mode:
git clone https://github.com/Gurumurthy30/Stackformer
cd transformers
pip install -e .
🚀 Future Plans
Currently, I am working on improving and optimizing the existing components while fixing known bugs and issues. After stabilizing the current modules, I plan to add more advanced blocks like Mixture of Experts (MoE), mask handling, and other essential transformer components. Eventually, I will expand the library by developing more comprehensive model architectures.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file stackformer-0.1.1.tar.gz.
File metadata
- Download URL: stackformer-0.1.1.tar.gz
- Upload date:
- Size: 14.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bc78c32c40b2dc476f85886e1fdca19dbaafbf0d24811c7f250f4d5d7f2c8153
|
|
| MD5 |
7358c03642b5201e1f67fc91f1c1eaeb
|
|
| BLAKE2b-256 |
1d09e6df619a4dc3f0fd46225527c284c174ffa1f80acdd7aa9eedf1e1dc0d2e
|
File details
Details for the file stackformer-0.1.1-py3-none-any.whl.
File metadata
- Download URL: stackformer-0.1.1-py3-none-any.whl
- Upload date:
- Size: 16.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
097c921c68d5c0057ac962311e00a2ed1e6f269a8c34acdca2acbc8c5d45027e
|
|
| MD5 |
f4dfeaee74605fa2e1db176648c58d0c
|
|
| BLAKE2b-256 |
6fc09e19d18905821acea2a8b15113e460a59a782ff83a8be4dab123bc494310
|