Skip to main content

A PyTorch implementation of transformer-based language models including GPT architecture for pretraining and fine-tuning

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

Language Modeling using Transformers (LMT)

Python PyTorch License Code style: ruff

A PyTorch implementation of transformer-based language models including GPT architecture for pretraining and fine-tuning. This project is designed for educational and research purposes to help users understand how the attention mechanism and Transformer architecture work in Large Language Models (LLMs).

๐Ÿš€ Features

  • GPT Architecture: Complete implementation of decoder-only transformer models
  • Attention Mechanisms: Multi-head self-attention with causal masking
  • Tokenization: Multiple tokenizer implementations (BPE, Naive)
  • Training Pipeline: Comprehensive trainer with pretraining and fine-tuning support
  • Educational Focus: Well-documented code for learning transformer internals
  • Modern Stack: Built with PyTorch 2.7+, Python 3.11+

๐Ÿ“ฆ Installation

Prerequisites

  • Python 3.11 or 3.12
  • PyTorch 2.7+

Install from PyPI

pip install language-modeling-transformers

Install from GitHub

pip install git+https://github.com/michaelellis003/LMT.git

๐Ÿƒโ€โ™‚๏ธ Quick Start

Basic Model Usage

from lmt import GPT, ModelConfig
from lmt.models.config import ModelConfigPresets
import torch

# Create a small GPT model
config = ModelConfigPresets.small_gpt()
model = GPT(config)

# Generate some text
input_ids = torch.randint(0, config.vocab_size, (1, 10))
with torch.no_grad():
    logits = model(input_ids)
    print(f"Output shape: {logits.shape}")  # (1, 10, vocab_size)

Training a Model

from lmt import Trainer, GPT
from lmt.training import BaseTrainingConfig
from lmt.models.config import ModelConfigPresets

# Configure model and training
model_config = ModelConfigPresets.small_gpt()
training_config = BaseTrainingConfig(
    num_epochs=10,
    batch_size=4,
    learning_rate=1e-4
)

# Initialize model and trainer
model = GPT(model_config)
trainer = Trainer(
    model=model,
    train_loader=your_train_loader,
    val_loader=your_val_loader,
    config=training_config
)

# Start training
trainer.train()

Using the Training Script

# Pretraining
python scripts/train.py --task pretraining --num_epochs 20 --batch_size 4

# Classification fine-tuning
python scripts/train.py --task classification --download_model --learning_rate 1e-5

๐Ÿ“š Documentation

Model Components

  • GPT: Main model class implementing decoder-only transformer
  • TransformerBlock: Individual transformer layer with attention and feed-forward
  • MultiHeadAttention: Multi-head self-attention mechanism
  • CausalAttention: Attention with causal masking for autoregressive generation

Tokenizers

  • BPETokenizer: Byte-Pair Encoding tokenizer
  • NaiveTokenizer: Simple character-level tokenizer
  • BaseTokenizer: Abstract base class for custom tokenizers

Training

  • Trainer: Main training orchestrator with support for pretraining and fine-tuning
  • BaseTrainingConfig: Configuration class for training parameters
  • Custom datasets and dataloaders: Support for various text datasets

๐Ÿ—‚๏ธ Project Structure

src/lmt/
โ”œโ”€โ”€ __init__.py              # Main package exports
โ”œโ”€โ”€ models/                  # Model architectures
โ”‚   โ”œโ”€โ”€ gpt/                # GPT implementation
โ”‚   โ”œโ”€โ”€ config.py           # Model configuration
โ”‚   โ””โ”€โ”€ utils.py            # Model utilities
โ”œโ”€โ”€ layers/                  # Neural network layers
โ”‚   โ”œโ”€โ”€ attention/          # Attention mechanisms
โ”‚   โ””โ”€โ”€ transformers/       # Transformer blocks
โ”œโ”€โ”€ tokenizer/              # Tokenization implementations
โ”œโ”€โ”€ training/               # Training pipeline
โ””โ”€โ”€ generate.py             # Text generation utilities

scripts/
โ”œโ”€โ”€ train.py                # Main training script
โ””โ”€โ”€ utils.py                # Training utilities

tests/                      # Comprehensive test suite
notebooks/                  # Educational Jupyter notebooks
docs/                       # Sphinx documentation

๐Ÿ“Š Examples and Notebooks

Explore the interactive notebooks in the notebooks/ directory:

  • attention.ipynb: Understanding attention mechanisms
  • pretraining_gpt.ipynb: GPT pretraining walkthrough
  • tokenizer.ipynb: Tokenization techniques

๐Ÿ”ง Configuration

Model Configuration

from lmt.models.config import ModelConfig

config = ModelConfig(
    vocab_size=50257,
    embed_dim=768,
    context_length=1024,
    num_layers=12,
    num_heads=12,
    dropout=0.1
)

Training Configuration

from lmt.training.config import BaseTrainingConfig

training_config = BaseTrainingConfig(
    num_epochs=10,
    batch_size=8,
    learning_rate=3e-4,
    weight_decay=0.1,
    print_every=100,
    eval_every=500
)

๐Ÿ“„ License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

language_modeling_transformers-0.2.8.tar.gz (24.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

language_modeling_transformers-0.2.8-py3-none-any.whl (37.5 kB view details)

Uploaded Python 3

File details

Details for the file language_modeling_transformers-0.2.8.tar.gz.

File metadata

File hashes

Hashes for language_modeling_transformers-0.2.8.tar.gz
Algorithm Hash digest
SHA256 33af675f9a3930cce48c1deb1ee8fe331fbc683dd9a646bba224989690131c41
MD5 9afa6bbe615067a732c05069ada8a6dd
BLAKE2b-256 8e611d8dff707dd3be702e253b5479315eb446c46ef0f0bcf161a9ffb17e5cde

See more details on using hashes here.

File details

Details for the file language_modeling_transformers-0.2.8-py3-none-any.whl.

File metadata

File hashes

Hashes for language_modeling_transformers-0.2.8-py3-none-any.whl
Algorithm Hash digest
SHA256 a75b3c48c1ea9b8782928af88040b885a0af145120da81bac972ae6dbc4716f2
MD5 1a9398e520b02811b5e085dd627096b6
BLAKE2b-256 56f40769d5b5c8637abddce0c0d17c5296ca623c52f6c6f0adc6941b86a49fbb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page