Accelerator Module and Trainer based on Accelerate library for simple distributed train processes, inspired by PyTorch Lightning.

These details have not been verified by PyPI

Project links

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

AcceleratorModule

Module based on Accelerate 🤗 for distributed training accross multiple GPUs, with focus on readability and ease to customize experiments. We also integrate modified versions of DataCollators from Transformers library for huggingface standard tokenizers to integrate with different environments.

NOTE: Some features might not be tested and could cause problems. Feel free to open an issue or send a PR to fix any problem found.

AcceleratorModule will take care of the heavy lifting of distributed training on many GPUs. Accelerate is quite simple, and it has many adventages over PyTorch Lightning, mainly because it doesn't abstract the low level part of the training loop, so you can customize it however you want. The main idea of this little project is to have a standard way to make distributed training. This module let's you:

Define the logic involved for training and validation.
Define the logic involved to calculate different metrics in a simple and reduced manner.
Save checkpoints to recover training progress.
Early stopping by evaluating any best average metric.
Define the hyperparameters in a simple YAML file or HyperParameters object.
Visualize training progress using any supported tracker.
Manipulate how often are checkpoints done, evaluations, logging, model saving, etc.
Easily set an experimental environment by calling set_seed function.
And more.

Installation

AcceleratorModule is available via pip:

pip install accmt

Documentation

Checkout the new documentation at https://acceleratormodule.readthedocs.io.

Module Structure

Import AcceleratorModule:

from accmt import AcceleratorModule

The AcceleratorModule class has 2 main methods:

training_step: Defines the training logic.
validation_step: Defines the validation logic.

The structure looks like this:

class ExampleModule(AcceleratorModule):
    def __init__(self):
        self.model = ...

    def training_step(self, batch):
        x, y = batch
        # ...
        return train_loss

    def validation_step(self, batch):
        x, y = batch
        # ...
        return {
            "loss": val_loss,
            # any other metric...
        }

More information about module structure here.

To train this Module, you need a Trainer class:

from accmt import Trainer, HyperParameters

trainer = Trainer(
    #hps_config="hps_config.yaml",  # <--- can also be a YAML file.
    hps_config=HyperParameters(epochs=2),
    model_path="model_folder"
    # ... other arguments
)

More information about trainer here.

HPS config file

This is a YAML file containing hyperparameters for your training. The structure looks like the following:

hps:
  epochs: 40
  batch_size: 35
  optimizer:
    type: AdamW
    lr: 1e-3
    weight_decay: 1e-3
  scheduler:
    type: OneCycleLR
    max_lr: 1e-3

An optimizer (optim) is necessary, while a scheduler is optional (do not specify if you don't want to).

Available optimizer types are the following:

Optimizer	Source
Adam	PyTorch
Adadelta	PyTorch
Adagrad	PyTorch
Adamax	PyTorch
AdamW	PyTorch
Adafactor	HuggingFace
ASGD	PyTorch
LBFGS	PyTorch
NAdam	PyTorch
RAdam	PyTorch
RMSprop	PyTorch
Rprop	PyTorch
SGD	PyTorch
SparseAdam	PyTorch

Available schedulers types are the following:

Scheduler	Source
StepLR	PyTorch
LinearLR	PyTorch
ExponentialLR	PyTorch
CosineAnnealingLR	PyTorch
CyclicLR	PyTorch
OneCycleLR	PyTorch
CosineAnnealingWarmRestarts	PyTorch
CosineWithWarmup	HuggingFace
Constant	HuggingFace
ConstantWithWarmup	HuggingFace
CosineWithHardRestartsWithWarmup	HuggingFace
InverseSQRT	HuggingFace
LinearWithWarmup	HuggingFace
PolynomialDecayWithWarmup	HuggingFace

Finally, we can train our model by using the .fit() function, providing our AcceleratorModule and the train and validation datasets (from PyTorch):

trainer.fit(module, train_dataset, val_dataset)

More information about HPS config file here.

Run

To run training, you can use accmt command-line utilities (which is a wrapper around Accelerate 🤗)

accmt launch -N=8 --strat=deepspeed-2-bf16 train.py

This will run on 8 GPUs with DeepSpeed zero stage 2, with a mixed precision of bfloat16. If -N argument is not specified, accmt will launch N numbers of processes, where N will be equal to the number of GPUs detected in your system. Also, if --strat is not specified, default strategy will be DDP with no mixed precision.

You can use any Accelerate configuration that you want 🤗 (DDP, FSDP or DeepSpeed). For more strategies, check:

accmt strats  # --ddp | --fsdp | --deepspeed    <--- optional filters.

NOTE: You can also use accelerate command-line utilities instead.

More information about command-line utilities here.

Checkpointing

Checkpointing is a default process in ACCMT, and it's customizable with some parameters in the Trainer constructor:

trainer = Trainer(
    # ... Other parameters.
    checkpoint_every="2ep", # Checkpoint every N epochs, in this case, every 2 epochs.
    resume=True # Whether you want to resume from checkpoint (True), or start from scratch (False).
    # if not specified (None), resuming will be done automatically.
)

Save model

Model saving is an integrated feature of ACCMT. You can enable it by specifying a directory where to save the model.

You can also save model in 2 different default modes:

best_valid_loss: Saves the model whenever the validation loss is the best (default if not specified).
best_train_loss: Saves the model whenever the train loss is the best.

Or the following format:

best_{METRIC}: If you're using an specific metric to save the model, specify it after 'best_'. (e.g. 'best_accuracy'). NOTE: 'best_' prefix is optional.

And you can activate movel saving below or above a specific metric.

trainer = Trainer(...)
trainer.register_model_saving("accuracy", saving_above=0.2)

Gradient Accumulation

When training models, larger batch sizes are often more stable than little ones, but it comes at a cost of VRAM. One way to avoid this is to accumulate gradients for N steps. This way, we simulate larger batch sizes without increasing VRAM usage.

trainer = Trainer(..., grad_accumulation_steps=2)

Logging training progress

Logging training progress is set by default in ACCMT, as it is essential to track how good our experiments are, and determine if we're good to pause training.

There are only 2 parameters to change for this (in the Trainer constructor):

track_with: Specify the tracker you want to use. Only available option (for now) is "mlflow".
logging_dir: Specifies a logging directory (default is "logs"). This can be a directory path or a URL.
log_every: Log every N number of steps (default is 1).

Collate Functions

You can implement your own collate function by overriding collate_fn from AcceleratorModule:

class ExampleModule(AcceleratorModule):
    # Rest of the code...

    def collate_fn(self, batch: list):
        # Your collate function logic here.

        return batch # Output taken in training and validation steps.

There is another and simplier way to add collators that I'm going to be building in the future, and that is using a specific DataCollator built into this library.

At the moment, there are 3 collators directly inspired on the transformers library (with some modifications like recursive approaches to iterate over different arguments in the __getitem__ function of the Dataset):

DataCollatorForSeq2Seq: Adds efficient padding when dealing with sequence-to-sequence problems.
DataCollatorForLongestSequence: Adds efficient padding for a batch.
DataCollatorForLanguageModeling: Implements Masked Language Modeling (MLM) task.

Example:

from accmt import Trainer, DataCollatorForSeq2Seq

tokenizer = ... # a tokenizer from 'transformers' library.

trainer = Trainer(
    hps_config="hps_config.yaml",
    model_path="dummy_model",
    collate_fn=DataCollatorForSeq2Seq(tokenizer)
)

Teacher-Student support

A Teacher-Student approach let's you mimic the behaviour of a bigger model (teacher) in a smaller model (student). This is a method for model distillation, useful to save computational resources and accelerate inference.

To load teacher and student models, we can do the following in the module constructor:

class TeacherStudentExampleModule(AcceleratorModule):
    def __init__(self):
        self.teacher = ... # teacher model
        self.model = ...   # student model

During training, the teacher model will only provide outputs, and will not have its parameters updated.

NOTE: In order to successfully load models into hardware, we must use self.teacher for teacher model, and self.model for student model.

Notes

I will continue to update this repository to add more features overtime. If you want to contribute to this little project, feel free to make a PR 🤗.

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.9.5.1

Mar 27, 2026

This version

1.9.5

Mar 26, 2026

1.9.3

Jul 14, 2025

1.9.2.2

Jun 27, 2025

1.9.2.1

Jun 22, 2025

1.9.2

Jun 19, 2025

1.9.1.2

Jun 3, 2025

1.9.1.1

May 30, 2025

1.9.1

May 26, 2025

1.9.0

May 12, 2025

1.8.9.3

May 11, 2025

1.8.9.1

May 9, 2025

1.8.9

May 9, 2025

1.8.5.2

Apr 30, 2025

1.8.5.1

Apr 24, 2025

1.8.5

Apr 22, 2025

1.8.0

Apr 2, 2025

1.7.7

Mar 19, 2025

1.7.6.7

Mar 14, 2025

1.7.6.6

Mar 14, 2025

1.7.6.5

Mar 13, 2025

1.7.6.4

Mar 13, 2025

1.7.6.3

Mar 13, 2025

1.7.6.2

Mar 13, 2025

1.7.6.1

Mar 12, 2025

1.7.6

Mar 12, 2025

1.7.5.1

Mar 9, 2025

1.7.5

Mar 7, 2025

1.7.4.2

Feb 6, 2025

1.7.4.1

Feb 3, 2025

1.7.4

Jan 26, 2025

1.7.3.3

Jan 26, 2025

1.7.3.2

Jan 26, 2025

1.7.3.1

Jan 26, 2025

1.7.3

Jan 24, 2025

1.7.2.1

Jan 13, 2025

1.7.2

Jan 13, 2025

1.7.1.3

Jan 9, 2025

1.7.1.2

Jan 8, 2025

1.7.1.1

Jan 6, 2025

1.7.1

Jan 6, 2025

1.7.0.2

Jan 6, 2025

1.7.0

Dec 24, 2024

1.2.6.1

Nov 11, 2024

1.2.6

Nov 6, 2024

1.2.5.8

Oct 18, 2024

1.2.5.7

Oct 11, 2024

1.2.5.6

Sep 18, 2024

1.2.5.5

Sep 14, 2024

1.2.5.4

Sep 7, 2024

1.2.5.3

Sep 7, 2024

1.2.5.2

Sep 7, 2024

1.2.5.1

Sep 4, 2024

1.2.5

Sep 3, 2024

1.2.4.7

Aug 28, 2024

1.2.4.6

Aug 28, 2024

1.2.4.5

Aug 27, 2024

1.2.4.4

Aug 27, 2024

1.2.4.3

Aug 27, 2024

1.2.4.2

Aug 27, 2024

1.2.4.1

Aug 26, 2024

1.2.4

Aug 26, 2024

1.2.3

Aug 26, 2024

1.2.2.4

Aug 26, 2024

1.2.2.3

Aug 26, 2024

1.2.2.2

Aug 25, 2024

1.2.2.1

Aug 21, 2024

1.2.2

Aug 21, 2024

1.2.0.2

Aug 20, 2024

1.2.0.1

Aug 20, 2024

1.2.0

Aug 17, 2024

1.1.9.1

Aug 17, 2024

1.1.9

Aug 17, 2024

1.1.8.2

Aug 13, 2024

1.1.8.1

Aug 9, 2024

1.1.8

Aug 6, 2024

1.1.7

Aug 6, 2024

1.1.6.1

Aug 4, 2024

1.1.6

Aug 4, 2024

1.1.5

Aug 2, 2024

1.1.4.1

Jul 31, 2024

1.1.4

Jul 31, 2024

1.1.3

Jul 26, 2024

1.1.2.1

Jul 12, 2024

1.1.1

Jul 9, 2024

1.1.0

Jul 6, 2024

1.0.8

Jul 1, 2024

1.0.7

Jul 1, 2024

1.0.6

Jun 24, 2024

1.0.5

Jun 24, 2024

1.0.3

May 6, 2024

1.0.2

May 2, 2024

1.0.1

Apr 28, 2024

1.0.0

Apr 28, 2024

0.0.9

Apr 10, 2024

0.0.8

Apr 9, 2024

0.0.7

Apr 1, 2024

0.0.6

Apr 1, 2024

0.0.5

Apr 1, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

accmt-1.9.5.tar.gz (88.2 kB view details)

Uploaded Mar 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

accmt-1.9.5-py3-none-any.whl (155.6 kB view details)

Uploaded Mar 26, 2026 Python 3

File details

Details for the file accmt-1.9.5.tar.gz.

File metadata

Download URL: accmt-1.9.5.tar.gz
Upload date: Mar 26, 2026
Size: 88.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for accmt-1.9.5.tar.gz
Algorithm	Hash digest
SHA256	`e5fb259b5b54c8f02dacb1c52be1e0857ffd1f2c164034ef0d33127c170a6155`
MD5	`b8b1dc65c49cc2766cb8ad7d47bc3e17`
BLAKE2b-256	`0fb14d35d7e96f15282e37e9f7a551b7320dbbdc6590445f76c8e27e76fcc982`

See more details on using hashes here.

File details

Details for the file accmt-1.9.5-py3-none-any.whl.

File metadata

Download URL: accmt-1.9.5-py3-none-any.whl
Upload date: Mar 26, 2026
Size: 155.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for accmt-1.9.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0b0b9721abc25f879c8a1df0d9296d54b1a15d7e90db75c3e65c86063d0d252e`
MD5	`2f42798bdcfe92f9294799f3fb8b77bb`
BLAKE2b-256	`fff267c00062544bbd1fd03429008a8c0f1d7c886326a375f3b28934e60cb7a4`

See more details on using hashes here.

accmt 1.9.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AcceleratorModule

Installation

Documentation

Module Structure

HPS config file

Run

Checkpointing

Save model

Gradient Accumulation

Logging training progress

Collate Functions

Teacher-Student support

Notes

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes