Inephany client library to use Paramorph Agents.
Project description
Paramorph Client Library
Paramorph is a client library that provides automated hyperparameter tuning for neural network training. It integrates seamlessly with Hugging Face Transformers and other PyTorch-based training frameworks to dynamically adjust learning rates, weight decay, and other hyperparameters during training.
Features
- Automated Hyperparameter Tuning: Dynamically adjusts learning rates, weight decay, and other optimizer parameters
- Hugging Face Integration: Built-in support for Hugging Face Transformers with minimal code changes
- Multi-Agent Architecture: Uses specialized agents for different parameter groups (embeddings, attention, linear layers, convolutions)
- Real-time Monitoring: Integrates with Weights & Biases for experiment tracking
- Flexible Configuration: Easy-to-use YAML configuration system
Installation
Prerequisites
- Python 3.12+
- PyTorch
- Hugging Face Transformers (for HF integration)
- [Optional] Weights & Biases account and API key (for experiment tracking and logging)
Setup
Paramorph depends on the libinephany package, which provides core utilities and data models. Installation instructions differ based on your use case:
Ensure that python3.12 and make is installed:
Ubuntu / Debian
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.12 make
MacOS with brew
brew install python@3.12
brew install make
For Developers (Monorepo)
If you're working within the Inephany monorepo, the libinephany package is already available and will be installed into the venv created for this package when you run make install-dev.
For Clients (Standalone Installation)
Since libinephany is not yet published on PyPI, you'll need to build and install both libinephany and paramorph manually from source. Follow these steps:
-
Create a new virtual environment
python3.12 -m venv myenv
-
Activate the virtual environment
source myenv/bin/activate
-
Install Build Tools
python -m pip install --upgrade pip setuptools build wheel
-
Change into the
libinephanydirectory -
Build and install
libinephanypython -m build pip install dist/libinephany-<version>-py3-none-any.whl
Replace
<version>with the actual version number of the built wheel. -
Change into the
paramorphdirectory -
Build and install
paramorphpython -m build pip install dist/paramorph-<version>-py3-none-any.whl
Replace
<version>with the actual version number of the built wheel.
Note:
- If you update either package, repeat the build and install steps for the updated package.
Then generate an API key in the portal and export it:
export PARAMORPH_API_KEY=YOUR_API_KEY
Quick Start with Hugging Face Transformers
Here's a complete example of using Paramorph with a GPT-2 model:
First - with your venv active - ensure datasets is installed with:
python -m pip install datasets
Optional: Set up Weights & Biases for Experiment Tracking
For enhanced monitoring and experiment tracking, you can integrate with Weights & Biases:
-
Install wandb (if not already installed):
python -m pip install wandb
-
Login to wandb:
wandb login
Then you can use the script:
from paramorph.build import build_for_huggingface
from transformers import GPT2LMHeadModel, GPT2Tokenizer, GPT2Config, TrainingArguments, DataCollatorForLanguageModeling
from datasets import load_dataset
import transformers
import torch.optim as optim
try:
import wandb # Optional!
except ImportError:
wandb = None
if wandb is not None:
# Optional, if you installed and configured Weights & Biases: Initialize wandb run
wandb.init(
project="paramorph-experiment",
name="gpt2-paramorph-tuning",
config={
"model": "gpt2",
"initial_lr": 0.0003,
"initial_weight_decay": 0.01,
}
)
# Load and prepare dataset
dataset = load_dataset("wikimedia/wikipedia", "20231101.simple", split="train", streaming=True)
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token
def tokenize_function(examples):
return tokenizer(
examples["text"],
truncation=True,
padding="max_length",
max_length=128,
)
tokenized_dataset = dataset.map(tokenize_function, batched=True, remove_columns=["text"])
# Create model
config = GPT2Config(
vocab_size=50257,
n_positions=1024,
n_ctx=512,
n_embd=768,
n_layer=3,
n_head=4,
)
model = GPT2LMHeadModel(config)
# Build Paramorph components
callbacks, optimizer, lr_scheduler, trainer_cls = build_for_huggingface(
model=model,
optimizer_type=optim.AdamW,
paramorph_config_path="./config.yaml",
initial_learning_rate=0.0003,
initial_weight_decay=0.01,
)
# Configure training arguments
args = TrainingArguments(
output_dir="./hf_test_models",
max_steps=10000,
num_train_epochs=1,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
learning_rate=0.0003,
weight_decay=0.01,
lr_scheduler_type="constant", # Required for Paramorph when using learning rate agents
max_grad_norm=-1, # Required for Paramorph when using gradient clipping agents
disable_tqdm=False,
dataloader_num_workers=2,
dataloader_pin_memory=True,
dataloader_prefetch_factor=2,
fp16=True,
gradient_checkpointing=False,
)
# Create and run trainer
trainer = trainer_cls(
model=model,
args=args,
train_dataset=tokenized_dataset,
eval_dataset=None,
data_collator=DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False),
processing_class=tokenizer,
optimizers=(optimizer, lr_scheduler),
callbacks=[callbacks],
)
trainer.train()
# Optional: Finish wandb run
if wandb is not None and wandb.run is not None:
wandb.finish()
What wandb provides with Paramorph:
- Real-time hyperparameter tracking for each parameter group
- Training metrics and loss curves
- Model performance comparisons across different hyperparameter settings
- Experiment organization and collaboration features
- Automatic logging of Paramorph's internal statistics and agent decisions
Configuration
Create a config.yaml file to configure Paramorph:
# Model identifier for the Inephany backend
inephany_model_id: alpha-v1
# Map model layers to agent types. There are four types currently: embedding, linear, convolution, attention
agent_modules:
transformer.wte: embedding
transformer.wpe: embedding
transformer.h.0: attention
transformer.h.1: attention
transformer.h.2: attention
transformer.ln_f: linear
# SDK configuration for backend communication
sdk_config:
max_retries: 10
backoff_factor: 0.5
max_backoff: 15.0
url_override: null
# Scheduling and tuning configuration
scheduling_config:
nn_family: gpt
tuning_frequency: 100 # How often to update hyperparameters (steps)
# Statistics collection settings
can_nullify_gradients: true
max_statistic_cache_size: 3
tensor_stats_downsample_percentage: 0.01
statistic_sample_frequency: 10
# Logging settings
log_to_wandb: true # Set to false if not using wandb
force_wandb_log_on_all_ranks: false
Generating the Agent Modules List
The agent_modules section in your config.yaml maps your model's named modules to agent types. This mapping tells Paramorph which parts of your model should be tuned by our agents.
Understanding Module Types
Paramorph supports four module types:
embedding: For embedding layers (e.g.,nn.Embedding)attention: For attention/transformer layers (e.g.,nn.MultiheadAttention, transformer blocks)linear: For linear/feedforward layers (e.g.,nn.Linear,nn.LayerNorm)convolutional: For convolutional layers (e.g.,nn.Conv2d,nn.Conv1d)
How to Generate the Modules List
- Print your model's named modules to see the structure:
import torch.nn as nn
from transformers import GPT2LMHeadModel
# Create your model
model = GPT2LMHeadModel.from_pretrained("gpt2")
# Print only modules with parameters
print("\nModules with parameters:")
for name, module in model.named_modules():
if list(module.parameters()):
print(f"{name}: {type(module).__name__}")
# To explore granularity options, you can also print the hierarchy:
print("\nModule hierarchy (for granularity decisions):")
for name, module in model.named_modules():
if list(module.parameters()):
depth = name.count('.')
indent = " " * depth
print(f"{indent}{name}: {type(module).__name__}")
- Categorize each module based on its type and create the mapping:
def categorize_module(module_name: str, module: nn.Module) -> str:
"""Categorize a module based on its type."""
module_type = type(module).__name__
if module_type == "Embedding":
return "embedding"
elif module_type in ["Linear", "LayerNorm"]:
return "linear"
elif module_type in ["Conv1d", "Conv2d", "Conv3d"]:
return "convolutional"
elif "attention" in module_name.lower() or "attn" in module_name.lower():
return "attention"
else:
# Default to linear for other types
return "linear"
# Generate the agent_modules dictionary
agent_modules = {}
for name, module in model.named_modules():
if list(module.parameters()): # Only include modules with parameters
agent_modules[name] = categorize_module(name, module)
print("Generated agent_modules:")
for name, module_type in agent_modules.items():
print(f" {name}: {module_type}")
# Example: Filter for different granularity levels
print("\nLayer-level modules (recommended):")
layer_level = {name: module_type for name, module_type in agent_modules.items()
if name.count('.') <= 2} # transformer.h.0, transformer.ln_f, etc.
for name, module_type in layer_level.items():
print(f" {name}: {module_type}")
print("\nComponent-level modules:")
component_level = {name: module_type for name, module_type in agent_modules.items()
if name.count('.') <= 3} # transformer.h.0.attn, transformer.h.0.mlp, etc.
for name, module_type in component_level.items():
print(f" {name}: {module_type}")
Choosing the Right Granularity
The granularity of your agent_modules mapping determines how fine-grained your hyperparameter tuning will be. You have several options:
Option 1: Layer-Level Granularity (Recommended)
Map entire transformer layers or major components:
agent_modules:
transformer.wte: embedding
transformer.wpe: embedding
transformer.h.0: attention # Entire transformer block 0
transformer.h.1: attention # Entire transformer block 1
transformer.h.2: attention # Entire transformer block 2
transformer.ln_f: linear
Pros: Simpler configuration, fewer agents to manage, expected form. Paramorph was trained at this level of granularity. Cons: Less fine-grained control.
Option 2: Component-Level Granularity
Map individual components within layers:
agent_modules:
transformer.wte: embedding
transformer.wpe: embedding
transformer.h.0.attn: attention # Just the attention component
transformer.h.0.mlp: linear # Just the MLP component
transformer.h.0.ln_1: linear # Just the layer norm
transformer.h.1.attn: attention
transformer.h.1.mlp: linear
transformer.h.1.ln_1: linear
transformer.ln_f: linear
Pros: More precise control over different components Cons: More complex configuration, more agents. Paramorph was NOT trained at this level of granularity.
Option 3: Parameter-Level Granularity (Not Recommended)
Map individual parameters:
agent_modules:
transformer.wte.weight: embedding
transformer.wte.bias: embedding
transformer.h.0.attn.c_attn.weight: attention
transformer.h.0.attn.c_attn.bias: attention
transformer.h.0.attn.c_proj.weight: attention
transformer.h.0.attn.c_proj.bias: attention
# ... many more entries
Pros: Maximum control Cons: Extremely complex, usually unnecessary, may hurt performance. Paramorph was NOT trained at this level of granularity.
How to Decide
-
Start with layer-level granularity - This works well for most models and is easier to manage.
-
Consider your model size:
- Small models (< 100M parameters): Layer-level is usually sufficient
- Medium models (100M - 1B parameters): Layer-level or component-level
- Large models (> 1B parameters): Component-level may be beneficial
-
Consider your tuning goals:
- General optimization: Layer-level is fine
- Fine-grained control: Component-level
- Research/experimentation: Component-level for insights
-
Consider computational overhead: More granular = more agents = more computational cost
Best Practices
-
Include all parameter-containing modules: Only modules with trainable parameters should be included in the mapping.
-
Use appropriate module types:
- Use
embeddingfor token/position embeddings - Use
attentionfor attention/transformer layers (e.g.,nn.MultiheadAttention, transformer blocks) - Use
linearfor linear/feedforward layers (e.g.,nn.Linear,nn.LayerNorm) - Use
convolutionalfor convolutional layers (e.g.,nn.Conv2d,nn.Conv1d) - Use
nullfor any other layers or do not mention them at all.
- Use
-
Be consistent with naming: The module names must exactly match those returned by
model.named_modules(). -
Test your configuration: After generating the config, verify it works by running a short training session.
-
Customize for your model: The automated script provides a good starting point, but you may need to adjust the categorization logic for custom model architectures.
-
Start simple, then refine: Begin with layer-level granularity and only increase granularity if you need more control.
Troubleshooting
-
Missing modules: If you see warnings about parameters not being assigned to any group, this means no Paramorph agents will act on them and their hyperparameters will remain constant throughout training.
-
Incorrect module types: While incorrect module types won't cause errors, they may affect the effectiveness of the tuning agents. Use the most appropriate type for each module.
-
Model architecture changes: If you modify your model architecture, regenerate the
agent_modulesmapping to ensure it matches the new structure.
Configuration Options
Agent Modules
Map your model's parameter groups to agent types:
embedding: For embedding layersattention: For attention/transformer layerslinear: For linear/feedforward layersconvolution: For convolutional layersnull: For any other layers
Scheduling Config
tuning_frequency: Steps between hyperparameter updatesnn_family: Model family (gpt, bert, olmo, etc.)log_to_wandb: Enable W&B loggingcan_nullify_gradients: Allow gradient nullification for statistics collectionmax_statistic_cache_size: Maximum number of cached statisticstensor_stats_downsample_percentage: Percentage of tensor statistics to collectstatistic_sample_frequency: How often to sample statistics
Agent Config
Enable/disable specific hyperparameter agents:
use_learning_rate_agents: Tune learning ratesuse_weight_decay_agents: Tune weight decayuse_dropout_agents: Tune dropout ratesuse_grad_clip_agents: Tune gradient clippinguse_adam_beta_one_agents: Tune Adam β₁use_adam_beta_two_agents: Tune Adam β₂use_adam_eps_agents: Tune Adam ε
Advanced Usage
Custom Training Loop
For non-Hugging Face training, use the core Paramorph class:
from paramorph.build import build
from paramorph.core import Paramorph
# Build optimizer and Paramorph instance
optimizer, paramorph = build(
model=model,
optimizer_type=optim.AdamW,
paramorph_config_path="./config.yaml",
initial_learning_rate=0.0003,
initial_weight_decay=0.01,
)
# Custom training loop
for epoch in range(num_epochs):
for batch in dataloader:
optimizer.zero_grad()
loss = model(batch)
loss.backward()
optimizer.step()
# Update hyperparameters
paramorph.step()
Custom Callbacks
Subclass ParamorphCallbacks to customize behavior:
from paramorph.paramorph_callbacks import ParamorphCallbacks
class CustomParamorphCallbacks(ParamorphCallbacks):
def set_learning_rate(self, parameter_group_name: str, value: float) -> None:
"""
:param parameter_group_name: Name of the parameter group whose hyperparameter is being changed.
:param value: New value to set the hyperparameter to.
"""
print(f"Parameter group {parameter_group_name} updated learning rate to {value}")
super().set_learning_rate(parameter_group_name, value)
# Use in build function
callbacks, optimizer, lr_scheduler, trainer_cls = build_for_huggingface(
model=model,
optimizer_type=optim.AdamW,
paramorph_config_path="./config.yaml",
initial_learning_rate=0.0003,
initial_weight_decay=0.01,
paramorph_callback_override=CustomParamorphCallbacks,
)
Troubleshooting
Common Issues
-
Import Errors: Ensure you're in the virtual environment and have installed the package correctly.
-
libinephany Import Errors: If you see import errors related to
libinephany, make sure you've installed it correctly:- For developers: Ensure you're in the monorepo and the package is available
- For clients: Make sure you've cloned and installed
libinephanyfrom its mirror repository before installing paramorph
-
W&B Login Issues: Make sure you're logged in with
wandb loginand have a valid API key. -
Configuration Errors: Check that your
config.yamlfollows the correct format and all required fields are present. -
Training Arguments: Ensure you're using the required Hugging Face training arguments:
lr_scheduler_type="constant"- Required when using learning rate agentsmax_grad_norm=-1- Required when using gradient clipping agents
-
Dependency Conflicts: If you encounter dependency conflicts, try installing in a fresh virtual environment:
python -m venv fresh_env source fresh_env/bin/activate # Install libinephany first, then paramorph
Getting Help
- Check the example scripts in the repository
- Review the configuration file format
- Ensure all dependencies are installed correctly
- Verify your model architecture matches the agent module mapping
Architecture
Paramorph uses a multi-agent architecture where different agents control hyperparameters for different parts of your model. Agents can be applied to any layer and at any level of granularity as defined in the config.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file paramorph-0.5.4.tar.gz.
File metadata
- Download URL: paramorph-0.5.4.tar.gz
- Upload date:
- Size: 28.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5eb9c948670bf1acbcf4c44d793fcf6c144e885d9697351bcfa5ca3ce654fe2
|
|
| MD5 |
a2dc441dd45bda78965734682ad039fb
|
|
| BLAKE2b-256 |
35a45a77e94a14e31e926ddd733a25162ba8264b178a173ebdc5f371b46c80d6
|
File details
Details for the file paramorph-0.5.4-py3-none-any.whl.
File metadata
- Download URL: paramorph-0.5.4-py3-none-any.whl
- Upload date:
- Size: 24.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
36ab7d980d214a617eb7af308c037926ff5c13b23c32dd141c55c7b0afbe369e
|
|
| MD5 |
52c0bf1cce674b0e769cf501c9e12bfb
|
|
| BLAKE2b-256 |
ac8ebcc62e56082ba02dae0abb66f834ec76732638967e1bf3fa73b39b121a8d
|