An advanced, from-scratch NLP framework for training and deploying modern transformer models.
Project description
██████ ██████ ██ ██ ██████ ██████ ██ ██ ██ ██ ██ ██████
██ ██ ███ ██ ██ ██ ██ ██ ███ ██ ██ ██ ██
██ █████ ████ ██ ██ ██ ██████ ████ ██ ██ ██████
██ ██ ██ ████ ██ ██ ██ ██ ██ ████ ██ ██
██ ██ ██ ███ ██ ██ ██ ██ ██ ███ ██ ██
██████ ██████ ██ ██ ██████ ██ ██ ██ ██ ██ ██████ ██
Zenith NLP Framework
A Framework for Advanced Natural Language Processing
ZenithNLP is an advanced, from-scratch NLP framework built with PyTorch for training, fine-tuning, and deploying modern transformer-based models. It serves as a comprehensive toolkit for NLP practitioners and researchers, featuring a modular architecture and a full suite of MLOps capabilities.
📜 Table of Contents
- ✨ Features
- 🚀 Getting Started
- 📖 Tutorial: Training a Text Classifier
- 🏛️ Framework Architecture
- 🤝 Contributing
- 📄 License
✨ Features
- State-of-the-Art Model Architectures: From-scratch implementations of:
BERT(Encoder-only) for tasks like classification and NER.GPT(Decoder-only) for causal language modeling and text generation.Seq2SeqTransformer(Encoder-Decoder) for translation and summarization.
- Advanced Training Techniques:
- Parameter-Efficient Fine-Tuning (PEFT): Integrated LoRA (Low-Rank Adaptation) for efficient fine-tuning of large models.
- Distributed Training: Support for multi-GPU training using PyTorch's
DistributedDataParallel. - Advanced Optimization: Includes learning rate scheduling with warm-up and gradient clipping.
- Full MLOps Pipeline:
- Configuration Management: Powered by Hydra, allowing for flexible and reproducible experiments through YAML files.
- Experiment Tracking: Integrated with MLflow to log parameters, metrics, and model artifacts automatically.
- Containerization: Fully containerized with Docker and Docker Compose for reproducible environments and easy deployment of the MLflow UI.
- Continuous Integration: Automated testing pipeline with GitHub Actions and
pytest.
- Flexible API for Deployment:
- A ready-to-use FastAPI server that can dynamically load and serve any model trained with the framework.
- Custom Core Components:
- A trainable Byte-Pair Encoding (BPE) Tokenizer built from scratch.
- Modular implementations of
MultiHeadAttention,PositionalEncoding, and other core transformer building blocks.
🚀 Getting Started
1. Installation (from PyPI)
Note: Once published, you will be able to install the framework directly from PyPI.
pip install zenith-nlp-framework
2. Local Development Setup
# 1. Clone the repository
git clone https://github.com/cattolatte/zenith-nlp-framework.git
cd zenith-nlp-framework
# 2. Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate
# 3. Install all dependencies
pip install -r requirements.txt
# 4. Install the project in editable mode
pip install -e .
📖 Tutorial: Training a Text Classifier
This framework is designed for flexibility. Here’s how you can train your own text classification model.
1. Prepare Your Data and Configs
Place your training data (e.g., my_data.csv) in a local data/ directory. Use the configs/ directory as a template. You can modify config.yaml or create a new one to point to your data file and adjust model/training parameters.
2. Run Training
Run the text classification task script. All parameters are managed by the Hydra configuration files in the configs/ directory.
# Run with default settings from the config files
python3 -m my_nlp_framework.tasks.text_classification
You can easily override any parameter from the command line:
# Train for more epochs with a different learning rate
python3 -m my_nlp_framework.tasks.text_classification training.epochs=10 training.learning_rate=0.0005
# Train with LoRA enabled
python3 -m my_nlp_framework.tasks.text_classification model.use_lora=True model.lora_rank=8
3. Track Experiments with MLflow
Before training, launch the MLflow UI to track your experiments in real-time. The docker-compose.yml file is pre-configured for you.
# Start the MLflow server in the background
docker-compose up -d
Navigate to http://localhost:5000 in your browser to view the MLflow dashboard.
🌐 Serving Your Model via API
Once you have a trained model (.pth file) and tokenizer (.json file), you can easily deploy it with the built-in FastAPI server.
python3 -m my_nlp_framework.inference.api \
--model-path /path/to/your/trained_model.pth \
--tokenizer-path /path/to/your/tokenizer.json \
--vocab-size 10000 \
--num-classes 2
The API will be available at http://localhost:8000/docs for interactive testing.
🐳 Running with Docker
You can also run the entire training process within a Docker container for perfect reproducibility.
# 1. Build the Docker image
docker build -t zenith-nlp-framework:latest .
# 2. Run a task (mounting your local data directory)
docker run --rm -v "$(pwd)/data":/app/data zenith-nlp-framework:latest \
python -m my_nlp_framework.tasks.text_classification
🏛️ Framework Architecture
This framework is organized into several key modules:
src/my_nlp_framework/core: Contains the fundamental building blocks like attention mechanisms, LoRA layers, and tokenizers.src/my_nlp_framework/models: Defines high-level model architectures like BERT and GPT.src/my_nlp_framework/data: Includes flexible data loaders.src/my_nlp_framework/training: A powerful, centralized training engine with advanced features.src/my_nlp_framework/tasks: Example scripts that show how to use the framework to solve end-to-end problems.src/my_nlp_framework/inference: Code for deploying and serving trained models.configs/: Centralized YAML configuration files for Hydra.tests/: Unit and integration tests for the framework.
🤝 Contributing
Contributions are welcome! Please feel free to submit a pull request or open an issue.
📄 License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zenith_nlp_framework-1.0.0.tar.gz.
File metadata
- Download URL: zenith_nlp_framework-1.0.0.tar.gz
- Upload date:
- Size: 17.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8a5f1bc5bb3568dc2008fc0621a90dbd32a8251afb757c227319ce1ae2ee9b2
|
|
| MD5 |
e6ea9986403b3a0a82afa7642ea8e245
|
|
| BLAKE2b-256 |
d0d9127a1146885679966b6d1d430c02078faed4963d1c328313454d41b3a922
|
File details
Details for the file zenith_nlp_framework-1.0.0-py3-none-any.whl.
File metadata
- Download URL: zenith_nlp_framework-1.0.0-py3-none-any.whl
- Upload date:
- Size: 18.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b52c51190fc1165282459bb679eff1bc4b055a6f74f720c4081d0ade014940ca
|
|
| MD5 |
7ff4156c6f93be4ed54b26f4dc2c76b5
|
|
| BLAKE2b-256 |
5fb9fcf3a83812e6c45b9dbc35031c1b09fce941f4c2a49fbfa912c6f43762c3
|