Skip to main content

A comprehensive implementation of the cycle-consistency training paradigm, extending the Huggingface Transformers trainer API to accommodate arbitrary combinations of generative models.

Project description

Cycleformers

Python License: CC BY 4.0

A Python library for efficient cycle-consistency training of transformer models. Cycleformers simplifies iterative back-translation with support for both causal and seq2seq architectures. We also implement Multi-Adapter Cycle-Consistency Training (MACCT), enabling training of LoRA adapters on a frozen base model for 7.5x larger model capacity for the same memory footprint.

Features

  • 🤗 Seamless integration with Hugging Face Transformers
  • 🚀 PEFT/LoRA support for memory-efficient training
  • 🤖 Compatible with both causal and seq2seq models
  • 🔥 Optimized for various hardware configurations

Quick Tour

Installation

pip install cycleformers

Training

The CycleTrainer class is an extension but significant redesign of the 🤗 Transformers trainer, designed to abstract away the specifics of training while remaining configurable. Both Seq2Seq and Causal architectures are supported, each able to train via PEFT adapter swapping for memory efficient configurations. Check the [docs] for [usage] details and [examples].

To train using two identical models the following sample code can be used along with two datasets:

from cycleformers import CycleTrainer, CycleTrainingArguments

model = AutoModelForCausalLM.from_pretrained("gpt2", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

args = CycleTrainingArguments(output_dir="gpt2-cct")
trainer = CycleTrainer(
    args, 
    models = model
    tokenizers = tokenizer
    train_dataset_A = dataset_A,
    train_dataset_B = dataset_B
)
trainer.train()

Any two models (🚧 currently both seq2seq or both causal) can be combined together for completely customisable training:

model_A = AutoModelForCausalLM.from_pretrained("gpt2", device_map="auto")
model_B = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base", device_map="auto")
tokenizer_A = AutoTokenizer.from_pretrained("gpt2")
tokenizer_B = AutoTokenizer.from_pretrained("google/flan-t5-small")

trainer = CycleTrainer(
    args, 
    models = {
        "A": model_A,
        "B": model_B
    }
    tokenizers = {
        "A": tokenizer_A,
        "B": tokenizer_B
    }
    train_dataset_A = dataset_A,
    train_dataset_B = dataset_B
)

Multi-Adapter Cycle-Consistency Training (MACCT)

The CycleTrainer class is also setup to accept a single base model and train two PEFT adapters ontop of it, switching between them to emulate the two model setup. This allows for the training of 7.5x larger models for the same memory footprint:

peft_config = PeftConfig(
    task_type="CAUSAL_LM",
    r=16,
    lora_alpha=32,
    target_modules="all-linear",
    inference_mode=False,
    bias="none"
)

args = CycleTrainingArguments(output_dir="gpt2-macct")
trainer = CycleTrainer(
    args, 
    model = model,
    tokenizer = tokenizer,
    peft_configs = peft_config # Or same A, B dict
)

Citing

If you use Cycleformers in your research, please cite:

add once zenodo/paper citation is available

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cycleformers-0.1.0.tar.gz (15.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cycleformers-0.1.0-py3-none-any.whl (16.0 kB view details)

Uploaded Python 3

File details

Details for the file cycleformers-0.1.0.tar.gz.

File metadata

  • Download URL: cycleformers-0.1.0.tar.gz
  • Upload date:
  • Size: 15.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for cycleformers-0.1.0.tar.gz
Algorithm Hash digest
SHA256 23f5985ab08eba95b7106d7d13c7ff3f9f7bd08655564981df63744725dfafb7
MD5 13ac53c911b6ed1b5feb37e9c966bfad
BLAKE2b-256 5987bfc7a537a4834d2977d8eab37f710b8c06062e3ee364ef20ccbc3e8a779c

See more details on using hashes here.

Provenance

The following attestation bundles were made for cycleformers-0.1.0.tar.gz:

Publisher: release.yml on wrmthorne/cycleformers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cycleformers-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: cycleformers-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for cycleformers-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cd05e82ac0654133d0580603bce064ad455ec6a8918a788a42d47e3b9dfa1074
MD5 1ee2ea12d648d6c98aa5b098ea3d95aa
BLAKE2b-256 b309615d7e84df1f24527bdd9499ff3ca3f8f163a95c1c13b1fc00c88f6f3011

See more details on using hashes here.

Provenance

The following attestation bundles were made for cycleformers-0.1.0-py3-none-any.whl:

Publisher: release.yml on wrmthorne/cycleformers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page