A comprehensive implementation of the cycle-consistency training paradigm, extending the Huggingface Transformers trainer API to accommodate arbitrary combinations of generative models.
Project description
Cycleformers
A Python library for efficient cycle-consistency training of transformer models. Cycleformers simplifies iterative back-translation with support for both causal and seq2seq architectures. We also implement Multi-Adapter Cycle-Consistency Training (MACCT), enabling training of LoRA adapters on a frozen base model for 7.5x larger model capacity for the same memory footprint.
Features
- 🤗 Seamless integration with Hugging Face Transformers
- 🚀 PEFT/LoRA support for memory-efficient training
- 🤖 Compatible with both causal and seq2seq models
- 🔥 Optimized for various hardware configurations
Quick Tour
Installation
pip install cycleformers
Training
The CycleTrainer class is an extension but significant redesign of the 🤗 Transformers trainer, designed to abstract away the specifics of training while remaining configurable. Both Seq2Seq and Causal architectures are supported, each able to train via PEFT adapter swapping for memory efficient configurations. Check the [docs] for [usage] details and [examples].
To train using two identical models the following sample code can be used along with two datasets:
from cycleformers import CycleTrainer, CycleTrainingArguments
model = AutoModelForCausalLM.from_pretrained("gpt2", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
args = CycleTrainingArguments(output_dir="gpt2-cct")
trainer = CycleTrainer(
args,
models = model
tokenizers = tokenizer
train_dataset_A = dataset_A,
train_dataset_B = dataset_B
)
trainer.train()
Any two models (🚧 currently both seq2seq or both causal) can be combined together for completely customisable training:
model_A = AutoModelForCausalLM.from_pretrained("gpt2", device_map="auto")
model_B = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base", device_map="auto")
tokenizer_A = AutoTokenizer.from_pretrained("gpt2")
tokenizer_B = AutoTokenizer.from_pretrained("google/flan-t5-small")
trainer = CycleTrainer(
args,
models = {
"A": model_A,
"B": model_B
}
tokenizers = {
"A": tokenizer_A,
"B": tokenizer_B
}
train_dataset_A = dataset_A,
train_dataset_B = dataset_B
)
Multi-Adapter Cycle-Consistency Training (MACCT)
The CycleTrainer class is also setup to accept a single base model and train two PEFT adapters ontop of it, switching between them to emulate the two model setup. This allows for the training of 7.5x larger models for the same memory footprint:
peft_config = PeftConfig(
task_type="CAUSAL_LM",
r=16,
lora_alpha=32,
target_modules="all-linear",
inference_mode=False,
bias="none"
)
args = CycleTrainingArguments(output_dir="gpt2-macct")
trainer = CycleTrainer(
args,
model = model,
tokenizer = tokenizer,
peft_configs = peft_config # Or same A, B dict
)
Citing
If you use Cycleformers in your research, please cite:
add once zenodo/paper citation is available
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cycleformers-0.1.0.tar.gz.
File metadata
- Download URL: cycleformers-0.1.0.tar.gz
- Upload date:
- Size: 15.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
23f5985ab08eba95b7106d7d13c7ff3f9f7bd08655564981df63744725dfafb7
|
|
| MD5 |
13ac53c911b6ed1b5feb37e9c966bfad
|
|
| BLAKE2b-256 |
5987bfc7a537a4834d2977d8eab37f710b8c06062e3ee364ef20ccbc3e8a779c
|
Provenance
The following attestation bundles were made for cycleformers-0.1.0.tar.gz:
Publisher:
release.yml on wrmthorne/cycleformers
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cycleformers-0.1.0.tar.gz -
Subject digest:
23f5985ab08eba95b7106d7d13c7ff3f9f7bd08655564981df63744725dfafb7 - Sigstore transparency entry: 154030375
- Sigstore integration time:
-
Permalink:
wrmthorne/cycleformers@a2ff666c5e3505c42a4a50de70cca9116d68af10 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/wrmthorne
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a2ff666c5e3505c42a4a50de70cca9116d68af10 -
Trigger Event:
push
-
Statement type:
File details
Details for the file cycleformers-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cycleformers-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd05e82ac0654133d0580603bce064ad455ec6a8918a788a42d47e3b9dfa1074
|
|
| MD5 |
1ee2ea12d648d6c98aa5b098ea3d95aa
|
|
| BLAKE2b-256 |
b309615d7e84df1f24527bdd9499ff3ca3f8f163a95c1c13b1fc00c88f6f3011
|
Provenance
The following attestation bundles were made for cycleformers-0.1.0-py3-none-any.whl:
Publisher:
release.yml on wrmthorne/cycleformers
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cycleformers-0.1.0-py3-none-any.whl -
Subject digest:
cd05e82ac0654133d0580603bce064ad455ec6a8918a788a42d47e3b9dfa1074 - Sigstore transparency entry: 154030376
- Sigstore integration time:
-
Permalink:
wrmthorne/cycleformers@a2ff666c5e3505c42a4a50de70cca9116d68af10 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/wrmthorne
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a2ff666c5e3505c42a4a50de70cca9116d68af10 -
Trigger Event:
push
-
Statement type: