Composable neural network components for building models in PyTorch.
Project description
composennent
Composable neural network components for building models in PyTorch.
Composennent provides modular, reusable building blocks for constructing transformer-based models. Train GPT, BERT, and other architectures with minimal code.
Features
- 🧩 Modular Components: Encoder, Decoder, Attention blocks that compose together
- 🚀 Built-in Training: Pre-training and fine-tuning with a single method call
- 📝 Multiple Architectures: GPT, BERT, Seq2Seq support out of the box
- 🔧 Tokenizer Support: WordPiece and SentencePiece tokenizers included
- ⚡ Mixed Precision: Automatic mixed precision (AMP) support
- 🎯 Instruction Tuning: Fine-tune models on instruction datasets (Alpaca format)
Installation
pip install composennent
For tokenizer support:
pip install composennent[tokenizers]
For development:
pip install composennent[dev]
Quick Start
Pre-train a GPT Model
import torch
from composennent.models import GPT
from composennent.nlp.tokenizers import SentencePieceTokenizer
# Create model
model = GPT(
vocab_size=32000,
latent_dim=512,
num_heads=8,
num_layers=6,
max_seq_len=512,
)
# Load tokenizer
tokenizer = SentencePieceTokenizer.from_pretrained("tokenizer.model")
# Pre-train
texts = ["Your training data here...", ...]
model.pretrain(
texts=texts,
tokenizer=tokenizer,
epochs=3,
batch_size=16,
device="cuda",
)
# Save
model.save("my_model.pt")
Fine-tune on Instructions
# Load pre-trained model
model = GPT.load("my_model.pt", device="cuda")
# Instruction data (Alpaca format)
instruction_data = [
{
"instruction": "What is the capital of France?",
"input": "",
"output": "The capital of France is Paris."
},
# ... more examples
]
# Fine-tune
model.fine_tune(
data=instruction_data,
tokenizer=tokenizer,
epochs=2,
lr=5e-5,
mask_prompt=True, # Only compute loss on outputs
)
Generate Text
prompt = tokenizer.encode("What is")
generated = model.generate(
input_ids=prompt,
max_length=100,
temperature=0.8,
)
print(tokenizer.decode(generated[0].tolist()))
Modules
| Module | Description |
|---|---|
composennent.modules |
Core building blocks (Encoder, Decoder, Block) |
composennent.modules.attention |
Attention mechanisms and masks |
composennent.models |
GPT, BERT, and other transformer models |
composennent.nlp.tokenizers |
WordPiece and SentencePiece tokenizers |
composennent.trainer |
Training utilities and trainer classes |
composennent.modules.experts |
Mixture of Experts components |
composennent.vision |
Vision transformer components |
composennent.utils |
Utility functions |
Training API
For more control over training, use the trainer classes directly:
from composennent.trainer import CausalLMTrainer, train
# Option 1: Use the train() convenience function
train(model, texts, tokenizer, model_type="causal_lm", epochs=5)
# Option 2: Use trainer class directly
trainer = CausalLMTrainer(model, tokenizer, device="cuda")
trainer.train(texts, epochs=5, batch_size=16)
trainer.save_checkpoint("checkpoint.pt")
Available trainers:
CausalLMTrainer- GPT-style next-token predictionMaskedLMTrainer- BERT-style masked language modelingSeq2SeqTrainer- Encoder-decoder modelsMultiTaskTrainer- Multi-task learning (MLM + NSP)CustomTrainer- Custom loss functions
Requirements
- Python >= 3.8
- PyTorch >= 2.0.0
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Install dev dependencies (
pip install -e ".[dev]") - Run tests (
pytest) - Run formatters (
black . && ruff check .) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
MIT License - see LICENSE for details.
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file composennent-0.4.8.tar.gz.
File metadata
- Download URL: composennent-0.4.8.tar.gz
- Upload date:
- Size: 62.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20fc6e0ef08786a4f8ff5fba0f724f53e475ddbce046b0a4599ffa2d344b0919
|
|
| MD5 |
4997119316815b32afdb3e78ad2e4f0a
|
|
| BLAKE2b-256 |
01846f7a725afc3667d4823dff0a8214de766fc417b55ac325a43115535a0ba5
|
File details
Details for the file composennent-0.4.8-py3-none-any.whl.
File metadata
- Download URL: composennent-0.4.8-py3-none-any.whl
- Upload date:
- Size: 85.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e45a4d5a3bb50cfc89f0142e4a4a00afdb4031ca7d71ef4996bbd3e6928bba6
|
|
| MD5 |
a1f7ebcb48141188c93f4f29edf12d20
|
|
| BLAKE2b-256 |
aa81581088de8e43985f88a3be00d52f165b14adc29848f4d1a8897370c5a46b
|