Convenient Text-to-Text Training for Transformers
Project description
t2t-tuner
Convenient Text-to-Text Training for Transformers
pip install t2t-tuner
Requires PyTorch: either follow PyTorch installation instructions or use a PyTorch container.
Features
- Easy training for text-to-text generation tasks
- Training methods/features:
- Supervised fine-tuning
- Gradient checkpointing
- Model parallelism
- Soft prompt tuning (based on this paper)
- Freeze encoder/decoder/embeddings
- Print model summary
- Based on the wonderful HuggingFace Transformers library. Tested on T5-based models. In theory, it should work with other models that support AutoModelForSeq2SeqLM as well
This work is based on HuggingFace's run_translation.py script for text-to-text generation tasks. I decided I want a more more convenient interface for training and inferencing, along with access to things like gradient checkpointing and model parallel to fit larger models - these are already in the HuggingFace library but not exposed in the script. I also added in some features that I wanted (prompt tuning, model summary) and wrapped it as a library that can be pip installed.
Examples
Simple snippet:
import t2t
trainer_arguments = t2t.TrainerArguments(model_name_or_path="t5-small",
train_file=YOUR_DATASET)
trainer = t2t.Trainer(arguments=trainer_arguments)
# train without validation
trainer.train(valid=False)
For more concrete examples, check out the notebooks linked below:
Data format:
{"translation": {"s": "TEXT", "t": "LABEL"}}
- The format of data is json-lines, following HuggingFace original script. Each example is one line
- Define the source and target IDs in
TrainingArguments.source_id
andTrainingArguments.target_id
(defaults tos
andt
) - Include the prefix in the data file, or define the prefix to prepend to the text in
TrainingArguments.prefix
- Example notebook for data preprocessing from CSV file
Training Large Models
Using this library, you can fine-tune the T5 11b checkpoints quite easily with the following settings:
- Batch size 1 + gradient accumulation to make up to whatever batch size you need
- Batch size of 8 is possible with gradient checkpointing, but doesn't improve the speed
- About 128GB of VRAM: 8x 16GB or 4x 32GB GPU (such as V100)
- FP32 (no need for mixed precision)
- FP16 would actually be better, but the pretrained T5 checkpoints don't play well with FP16 as the existing activations are too large (github issue tracking)
Note that depending on your system, the loading time for the checkpoint (46GB) can be quite long.
Development
Building Package
python3 -m pip install --upgrade build twine
python3 -m build
python3 -m twine upload dist/*
Disclaimers
This library as developed as a personal project for my own use. Please feel free to fork or use it for your own purposes as well. I will not take responsibility for any mishaps that occur as a result of this library's usage.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for t2t_tuner-0.0.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2ad1c3da116f25ad83d93c28230d4dedab20d00630bf071d2893b237f478231e |
|
MD5 | cf1c2170c50a7c484309dda88e0c140b |
|
BLAKE2b-256 | 1e118cc2350883e4e122d79e82e561e8218c99f4ac8cd3a76fa2f018cdfa451b |