Skip to main content

General purpose model trainer for PyTorch that is more flexible than it should be, by 🐸Coqui.

Project description

👟 Trainer

PyPI - License PyPI - Python Version PyPI - Version GithubActions GithubActions

An opinionated general purpose model trainer on PyTorch with a simple code base. Fork of the original, unmaintained repository. New PyPI package: coqui-tts-trainer

Installation

From PyPI:

pip install coqui-tts-trainer

From Github:

git clone https://github.com/idiap/coqui-ai-Trainer
cd coqui-ai-Trainer
pip install -e .

Implementing a model

Subclass and overload the functions in the TrainerModel()

Training a model with auto-optimization

See the MNIST example.

Training a model with advanced optimization

With 👟 you can define the whole optimization cycle as you want as the in GAN example below. It enables more under-the-hood control and flexibility for more advanced training loops.

You just have to use the scaled_backward() function to handle mixed precision training.

...

def optimize(self, batch, trainer):
    imgs, _ = batch

    # sample noise
    z = torch.randn(imgs.shape[0], 100)
    z = z.type_as(imgs)

    # train discriminator
    imgs_gen = self.generator(z)
    logits = self.discriminator(imgs_gen.detach())
    fake = torch.zeros(imgs.size(0), 1)
    fake = fake.type_as(imgs)
    loss_fake = trainer.criterion(logits, fake)

    valid = torch.ones(imgs.size(0), 1)
    valid = valid.type_as(imgs)
    logits = self.discriminator(imgs)
    loss_real = trainer.criterion(logits, valid)
    loss_disc = (loss_real + loss_fake) / 2

    # step dicriminator
    self.scaled_backward(loss_disc, None, trainer)

    if trainer.total_steps_done % trainer.grad_accum_steps == 0:
        trainer.optimizer[0].step()
        trainer.optimizer[0].zero_grad()

    # train generator
    imgs_gen = self.generator(z)

    valid = torch.ones(imgs.size(0), 1)
    valid = valid.type_as(imgs)

    logits = self.discriminator(imgs_gen)
    loss_gen = trainer.criterion(logits, valid)

    # step generator
    self.scaled_backward(loss_gen, None, trainer)
    if trainer.total_steps_done % trainer.grad_accum_steps == 0:
        trainer.optimizer[1].step()
        trainer.optimizer[1].zero_grad()
    return {"model_outputs": logits}, {"loss_gen": loss_gen, "loss_disc": loss_disc}

...

See the GAN training example with Gradient Accumulation

Training with Batch Size Finder

see the test script here for training with batch size finder.

The batch size finder starts at a default BS(defaults to 2048 but can also be user defined) and searches for the largest batch size that can fit on your hardware. you should expect for it to run multiple trainings until it finds it. to use it instead of calling trainer.fit() youll call trainer.fit_with_largest_batch_size(starting_batch_size=2048) with starting_batch_size being the batch the size you want to start the search with. very useful if you are wanting to use as much gpu mem as possible.

Training with DDP

$ python -m trainer.distribute --script path/to/your/train.py --gpus "0,1"

We don't use .spawn() to initiate multi-gpu training since it causes certain limitations.

  • Everything must the pickable.
  • .spawn() trains the model in subprocesses and the model in the main process is not updated.
  • DataLoader with N processes gets really slow when the N is large.

Training with Accelerate

Setting use_accelerate in TrainingArgs to True will enable training with Accelerate.

You can also use it for multi-gpu or distributed training.

CUDA_VISIBLE_DEVICES="0,1,2" accelerate launch --multi_gpu --num_processes 3 train_recipe_autoregressive_prompt.py

See the Accelerate docs.

Adding a callback

👟 Supports callbacks to customize your runs. You can either set callbacks in your model implementations or give them explicitly to the Trainer.

Please check trainer.utils.callbacks to see available callbacks.

Here is how you provide an explicit call back to a 👟Trainer object for weight reinitialization.

def my_callback(trainer):
    print(" > My callback was called.")

trainer = Trainer(..., callbacks={"on_init_end": my_callback})
trainer.fit()

Profiling example

  • Create the torch profiler as you like and pass it to the trainer.
    import torch
    profiler = torch.profiler.profile(
        activities=[
            torch.profiler.ProfilerActivity.CPU,
            torch.profiler.ProfilerActivity.CUDA,
        ],
        schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=2),
        on_trace_ready=torch.profiler.tensorboard_trace_handler("./profiler/"),
        record_shapes=True,
        profile_memory=True,
        with_stack=True,
    )
    prof = trainer.profile_fit(profiler, epochs=1, small_run=64)
    then run Tensorboard
    
  • Run the tensorboard.
    tensorboard --logdir="./profiler/"
    

Supported Experiment Loggers

To add a new logger, you must subclass BaseDashboardLogger and overload its functions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

coqui_tts_trainer-0.3.1.tar.gz (50.3 kB view details)

Uploaded Source

Built Distribution

coqui_tts_trainer-0.3.1-py3-none-any.whl (57.2 kB view details)

Uploaded Python 3

File details

Details for the file coqui_tts_trainer-0.3.1.tar.gz.

File metadata

  • Download URL: coqui_tts_trainer-0.3.1.tar.gz
  • Upload date:
  • Size: 50.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for coqui_tts_trainer-0.3.1.tar.gz
Algorithm Hash digest
SHA256 ca32abaf43febb4012a6a0c61e265b1635f91455acbce17fd34a2b5eae3af28c
MD5 313a30a519861a85ebbfa036dc165871
BLAKE2b-256 e9f868315f71c420382873a8b2bd47b0c3113213f51bf4e932d56c9aed659b80

See more details on using hashes here.

Provenance

The following attestation bundles were made for coqui_tts_trainer-0.3.1.tar.gz:

Publisher: pypi-release.yml on idiap/coqui-ai-Trainer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file coqui_tts_trainer-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for coqui_tts_trainer-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 eba2449b1c7e6a1fb7454608595949dbe4c1a409bd0d23e1fe93163637600392
MD5 0e7358340fab04aa30c818ce4973030d
BLAKE2b-256 1579d08e2b3974448bbfede88d7d3c81b67896ccaf8ed77ef10d4ddb8e7d194f

See more details on using hashes here.

Provenance

The following attestation bundles were made for coqui_tts_trainer-0.3.1-py3-none-any.whl:

Publisher: pypi-release.yml on idiap/coqui-ai-Trainer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page