FMS Acceleration Plugin Framework
Project description
FMS Acceleration Framework Library
This contains the library code that implements the acceleration plugin framework, in particular the classes:
AccelerationFramework
AccelerationPlugin
The library is envisioned to:
- Provide single integration point into Huggingface.
- Manage
AccelerationPlugin
in a flexible manner. - Load plugins from single configuration YAML, while enforcing compatiblity rules on how plugins can be combined.
See following resources:
- Instructions for running acceleration framework with
fms-hf-tuning
- Sample plugin YAML configurations for important accelerations.
Using AccelerationFramework with HF Trainer
Being by instantiating an AccelerationFramework
object, passing a YAML configuration (say via a path_to_config
):
from fms_acceleration import AccelerationFramework
framework = AccelerationFramework(path_to_config)
Plugins automatically configured based on configuration; for more details on how plugins are configured, see below.
Some plugins may require custom model loaders (in replacement of the typical AutoModel.from_pretrained
). In this case, call framework.model_loader
:
model = framework.model_loader(model_name_or_path, ...)
E.g., in the GPTQ example, see sample GPTQ QLoRA configuration, we require model_name_or_path
to be custom loaded from a quantized checkpoint.
We provide a flag framework.requires_custom_loading
to check if plugins require custom loading.
Also some plugins will require the model to be augmented, e.g., replacing layers with plugin-compliant PEFT adapters. In this case:
# will also take in some other configs that may affect augmentation
# some of these args may be modified due to the augmentation
# e.g., peft_config will be consumed in augmentation, and returned as None
# to prevent SFTTrainer from doing extraneous PEFT logic
model, (peft_config,) = framework.augmentation(
model,
train_args, modifiable_args=(peft_config,),
)
We also provide framework.requires_agumentation
to check if augumentation is required by the plugins.
Finally pass the model to train:
# e.g. using transformers.Trainer. Pass in model (with training enchancements)
trainer = Trainer(model, ...)
# call train
trainer.train()
Thats all! the model will not be reap all acceleration speedups based on the plugins that were installed!
Configuration of Plugins
Each package in this monorepo:
-
can be independently installed. Install only the libraries you need:
pip install fms-acceleration/plugins/accelerated-peft pip install fms-acceleration/plugins/fused-ops-and-kernels
-
can be independently configured. Each plugin is registed under a particular configuration path. E.g., the autogptq plugin is reqistered under the config path
peft.quantization.auto_gptq
.AccelerationPlugin.register_plugin( AutoGPTQAccelerationPlugin, configuration_and_paths=["peft.quantization.auto_gptq"], )
This means that it will be configured under theat exact stanza:
plugins: peft: quantization: auto_gptq: # everything under here will be passed to plugin # when instantiating ...
-
When instantiating
fms_acceleration.AccelerationFramework
, it internally parses through the configuration stanzas. For plugins that are installed, it will instantiate them; for those that are not, it will simply passthrough. -
AccelerationFramework
will manage plugins transparently for user. User only needs to callAccelerationFramework.model_loader
andAccelerationFramework.augmentation
.
Adding New Plugins
To add new plugins:
-
Create an appropriately
pip
-packaged plugin inplugins
; the package needs to be named likefms-acceleration-<postfix>
. -
For
framework
to properly load and manage plugin, add the package<postfix>
to constants.py:PLUGINS = [ "peft", "foak", "<postfix>", ]
-
Create a sample template YAML file inside the
<PLUGIN_DIR>/configs
to demonstrate how to configure the plugin. As an example, reference the sample config for accelerated peft. -
Update generate_sample_configurations.py and run
tox -e gen-configs
on the top level directory to generate the sample configurations.KEY_AUTO_GPTQ = "auto_gptq" KEY_BNB_NF4 = "bnb-nf4" PLUGIN_A = "<NEW PLUGIN NAME>" CONFIGURATIONS = { KEY_AUTO_GPTQ: "plugins/accelerated-peft/configs/autogptq.yaml", KEY_BNB_NF4: ( "plugins/accelerated-peft/configs/bnb.yaml", [("peft.quantization.bitsandbytes.quant_type", "nf4")], ), PLUGIN_A: ( "plugins/<plugin>/configs/plugin_config.yaml", [ (<1st field in plugin_config.yaml>, <value>), (<2nd field in plugin_config.yaml>, <value>), ] ) } # Passing a tuple of configuration keys will combine the templates together COMBINATIONS = [ ("accelerated-peft-autogptq", (KEY_AUTO_GPTQ,)), ("accelerated-peft-bnb-nf4", (KEY_BNB_NF4,)), (<"combined name with your plugin">), (KEY_AUTO_GPTQ, PLUGIN_A) (<"combined name with your plugin">), (KEY_BNB_NF4, PLUGIN_A) ]
-
After sample configuration is generated by
tox -e gen-configs
, update CONTENTS.yaml with the shortname and the configuration fullpath. -
Update scenarios YAML to configure benchmark test scenarios that will be triggered when running
tox -e run-benches
on the top level directory. -
Update the top-level tox.ini to install the plugin for the
run-benches
.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for fms_acceleration-0.4.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac8d884ec84ffadeae311dc656a9655f17a8aac8a67bc7a99c935f9a5f81d6c4 |
|
MD5 | df160aa59613a4ecfb9bc4403648bc55 |
|
BLAKE2b-256 | 70b38273157993d9d5f7da68e25b45edd8b234ba5435e7952fb9be8aa2bf0465 |