Skip to main content

dytr: Dynamic Transformer for Multi-Task Learning with Continual Learning Support

Project description

dytr - Dynamic Transformer Library

dytr is a flexible PyTorch library for multi-task learning with dynamic transformer architectures. Train multiple tasks sequentially or simultaneously while preserving performance on previous tasks through built-in continual learning techniques. it also supports to finetune and modify pretrained model such as bert.

Python 3.8+ License PyPI version Open In Colab

Build dynamic transformers that learn multiple tasks.

Why dytr?

  • ๐ŸŽฏ Multi-Task Ready - Train classification, generation, and sequence tasks in one model
  • ๐Ÿง  Never Forgets - Built-in EWC and experience replay prevent catastrophic forgetting
  • ๐Ÿ”ง No Black Box - Full control over architecture, understand every component
  • โšก Lightweight - Pure PyTorch, minimal dependencies
  • ๐Ÿ“ฆ Pretrained Support - Load BERT, RoBERTa, and more as your encoder backbone and fine tune it on multiple tasks.

Installation

pip install dytr

Quick Start

from dytr import DynamicTransformer, ModelConfig, TaskConfig, TrainingStrategy, Trainer, SingleDatasetProcessing
import pandas as pd

# 1. Configure your transformer
config = ModelConfig(
    embed_dim=256,
    num_layers=6,
    num_heads=8,
    max_seq_len=256
)

# 2. Create the model
model = DynamicTransformer(config)

# data loading and processing
train_data = pd.DataFrame({
    'text': ['Great movie!', 'Terrible film.', 'Amazing acting!', 'Boring plot.'],
    'label': [1, 0, 1, 0]
})
train_dataset = SingleDatasetProcessing(
    df=train_data,
    tokenizer=model.tokenizer,
    max_len=128,
    task_name="sentiment_analysis",
    strategy=TrainingStrategy.SENTENCE_CLASSIFICATION,
    text_column="text",
    label_column="label"
)
# 3. Add a task
task = TaskConfig(
    task_name="sentiment_analysis",
    training_strategy=TrainingStrategy.SENTENCE_CLASSIFICATION,
    num_labels=2,# train_data.num_labels
)
#model.add_task(task) # not require it will be add automatically during the training process

# Initialize trainer and train
trainer = Trainer(model, config, exp_dir="./experiments")
train_datasets = {"sentiment_analysis": (train_dataset, TrainingStrategy.SENTENCE_CLASSIFICATION)}
model = trainer.train([task], train_datasets, {})# you can set more than one for list of tasks and dataset for multitasks training 

# 4. Generate predictions
result = model.generate("This product is amazing!", task_name="sentiment_analysis")
print(f"Prediction: {result['prediction']}")

# Save the entire multi-task model
model.save_model("multi_task_model.pt")

# Load the model
loaded_model = DynamicTransformer.load_model("multi_task_model.pt")

Core Capabilities

Multiple Training Strategies

Strategy Purpose Use Case
Causal LM Autoregressive text generation Chatbots, content creation
Seq2Seq Input to output transformation Translation, summarization
Sentence Classification Document-level categorization Sentiment, topic detection
Token Classification Token-level labeling Named entity recognition, POS tagging

Continual Learning

Train tasks sequentially without losing previous knowledge:

config = ModelConfig(
    use_ewc=True,              # Protect important weights
    use_replay=True,           # Replay old samples
    use_task_adapters=True,    # Task-specific modules
    ewc_lambda=1000.0,
    replay_buffer_size=2000
)

model = DynamicTransformer(config)

# Train tasks one after another
for task in task_list:
    model.add_task(task)
    trainer.train([task], train_data, val_data)
    # Previous tasks remain accurate
    # The trainer automatically handles EWC and replay buffer, but you should add the samples to the pretrained model

Pretrained Encoders

Load powerful encoders as your backbone and extend them with tasks:

from dytr import PretrainedModelLoader
model_name='prajjwal1/bert-tiny'
loader = PretrainedModelLoader()
config = ModelConfig(tokenizer_name=model_name, per_device_train_batch_size=32, num_train_epochs=3, per_device_eval_batch_size=8, special_tokens={}, use_task_adapters=False, use_ewc=True, use_replay=True, use_rotary_embedding=False, training_from_scratch=False)

# Load pretrained BERT as your encoder
model = loader.load_pretrained(model_name, config)

# Now add your own tasks - the model is fully dytr compatible
class_train = pd.DataFrame(
        {
            "text": [
                "Great product!",
                "Poor quality.",
                "Excellent service!",
                "Very disappointed.",
                "Highly recommended!",
            ],
            "label": [1, 0, 1, 0, 1],
        }
    )
classification_task = TaskConfig(
        task_name="sentiment",
        training_strategy=TrainingStrategy.SENTENCE_CLASSIFICATION,
        num_labels=2,
        text_column="text",
        label_column="label",
        max_length=128,
    )
class_dataset = SingleDatasetProcessing(
        df=class_train,
        tokenizer=model.tokenizer,
        max_len=classification_task.max_length,
        task_name=classification_task.task_name,
        strategy=classification_task.training_strategy,
        num_labels=classification_task.num_labels,
        text_column=classification_task.text_column,
        label_column=classification_task.label_column,
    )
# Causal LM task data (text generation)
lm_train = pd.DataFrame(
        {
            "text": [
                "The sun rises in the east.",
                "Cats are adorable animals.",
                "Machine learning is fascinating.",
                "Python is a great programming language.",
                "Deep learning powers modern AI.",
            ]
        }
    )
lm_task = TaskConfig(
        task_name="text_generation",
        training_strategy=TrainingStrategy.CAUSAL_LM,
        max_length=256,
    )
lm_dataset = SingleDatasetProcessing(
        df=lm_train,
        tokenizer=model.tokenizer,
        max_len=lm_task.max_length,
        task_name=lm_task.task_name,
        strategy=lm_task.training_strategy,
        text_column="text",
    )
train_datasets = {
        classification_task.task_name: (class_dataset, classification_task.training_strategy),
        lm_task.task_name: (lm_dataset, lm_task.training_strategy),
    }
val_datasets = {
        #classification_task.task_name: (class_val_dataset, classification_task.training_strategy)
    }

# 6. Train model
print("\n6. Training model...")
trainer = Trainer(model, config, exp_dir="./multi_task_experiments")
model = trainer.train([classification_task, lm_task], train_datasets, val_datasets)

#model = trainer.train([ lm_task], train_datasets, val_datasets)
test_texts = ["This is amazing!", "I hate this."]
for text in test_texts:
      result = model.generate(text, task_name="sentiment")
      sentiment = "POSITIVE" if result["prediction"] == 1 else "NEGATIVE"
      print(f"      {text} -> {sentiment}")

# Test generation
print("\n   Text generation test:")
prompt = "The future of technology"
generated = model.generate(prompt, task_name="text_generation", max_new_tokens=20)
print(f"      Prompt: {prompt}")
print(f"      Generated: {generated}")



#model.add_task(sentiment_task)
#model.add_task(ner_task)
#model.add_task(translation_task)

# Train, generate, and use just like any dytr model

Task-Specific Learning Rates

Different components learn at different speeds:

config = ModelConfig(
    learning_rate=3e-4,
    head_lr_mult=2.0,      # Task heads: fast adaptation
    decoder_lr_mult=0.5,   # Decoders: moderate
    shared_lr_mult=0.1     # Shared encoder: preserve knowledge
)

Architecture Overview

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚         DynamicTransformer              โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚     Shared Encoder               โ”‚   โ”‚
โ”‚  โ”‚  (Pretrained or from scratch)    โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚                  โ”‚                      โ”‚
โ”‚    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”        โ”‚
โ”‚    โ–ผ             โ–ผ             โ–ผ        โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚
โ”‚ โ”‚Task 1โ”‚    โ”‚Task 2โ”‚    โ”‚Task 3โ”‚       โ”‚
โ”‚ โ”‚ Head โ”‚    โ”‚ Head โ”‚    โ”‚Decoderโ”‚      โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚
โ”‚    โ”‚           โ”‚           โ”‚            โ”‚
โ”‚    โ–ผ           โ–ผ           โ–ผ            โ”‚
โ”‚Classification  NER    Generation       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Who Should Use dytr?

Audience Why It Matters
Researchers Customize every aspect of the transformer architecture, Test continual learning algorithms with EWC and experience replay, experiment with multi-task architectures, Experiment with task-specific learning rates and adapters, Analyze forgetting behavior across sequential tasks
Developers Add new tasks without retraining from scratch, Load pretrained models and extend them with your own tasks, Build production-ready multi-task systems without complex dependencies
Students Understand transformers from scratch with transparent, readable code, Visualize the impact of hyperparameters on model size, Learn multi-task learning concepts hands-on
Organizations Deploy single models that handle multiple tasks efficiently , Deploy lighter, faster inference systems, Maintain knowledge across task updates with continual learning

Key Differentiators

  • Full Transparency - No hidden complexity, understand every component
  • Continual Learning First - Built from the ground up for sequential task learning
  • Truly Dynamic - Add or remove tasks without retraining from scratch
  • Pure PyTorch - No heavy dependencies, easy to customize

Requirements

  • Python 3.8+
  • PyTorch 1.10+
  • NumPy
  • pandas
  • scikit-learn
  • tqdm
  • requests

Documentation

  • ModelConfig: Architecture, training, and continual learning parameters
  • TaskConfig: Dataset configuration, column mapping, task-specific settings
  • TrainingStrategy: Causal LM, Seq2Seq, Sentence Classification, Token Classification
  • PretrainedModelLoader: Load BERT, RoBERTa, DistilBERT, ALBERT as encoders

License

Apache License 2.0

Author

Dr. Akram Alsubari

Contributing

Contributions are welcome! Open issues or share your use cases.

Support and Contact

For questions, issues, or suggestions: For questions, issues, or suggestions:

  • ๐Ÿ“ง Email: akram.alsubari@outlook.com

  • ๐Ÿ”— LinkedIn: https://www.linkedin.com/in/akram-alsubari/

  • ๐Ÿ“ฑ Connect: Feel free to reach out for collaborations, research discussions, or feedback

  • ๐ŸŽ“ Research Interests: Natural Language Processing, Deep Learning, Transformers, Continual Learning, Multi-Task Learning, Large Language Models


Build once. Learn multiple tasks. Never forget.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dytr-0.1.1.tar.gz (61.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dytr-0.1.1-py3-none-any.whl (73.5 kB view details)

Uploaded Python 3

File details

Details for the file dytr-0.1.1.tar.gz.

File metadata

  • Download URL: dytr-0.1.1.tar.gz
  • Upload date:
  • Size: 61.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for dytr-0.1.1.tar.gz
Algorithm Hash digest
SHA256 88fef4326de57a9b68f12e2b60b2c3a8e907a122e0bc8c31977a74b1a93ec24f
MD5 1eb979d717f5d82cd47c20f45dd05aed
BLAKE2b-256 da14749c6fd83e8d4fb56482d67099ebf586e0319c91dbfe40e04f3837034367

See more details on using hashes here.

File details

Details for the file dytr-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: dytr-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 73.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for dytr-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4183453c5f1cc9964ba4c952458f88d559922ba87301e3c5b29c7f4e5ccd82c8
MD5 012e7a8326a73c06e7178c8561be896d
BLAKE2b-256 fccd2fb1d84f333750f7c5c272d7c9e3da084cf179e66f35038ca88412f57214

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page