Very Simple Transformers provides a simplified interface for packaging, deploying, and serving Transformer models. It uses a custom .vst file format, CLI, and Python library to make sharing and using pretrained models easy.
Project description
Very Simple Transformers (VST)
Table of Contents
Installation
Usage
Bundling
Bundling your machine learning model with VerySimpleTransformers is straightforward. Follow these steps:
Create or load your machine learning model (e.g., a ClassificationModel) using Simple Transformers.
Example:
from simpletransformers.classification import ClassificationArgs, ClassificationModel
# Create a ClassificationModel
model_args = ClassificationArgs(
num_train_epochs=1,
overwrite_output_dir=True,
save_model_every_epoch=False,
)
model = ClassificationModel(
"bert",
"bert-base-cased", # or something like ./outputs
args=model_args,
use_cuda=True,
)
# optionally train on your data:
model.train_model(...)
Import the verysimpletransformers
package, and use the to_vst
function to package the model into a .vst file with
optional compression.
Provide the model, the output file path, and an optional compression level (0-9). By default, compression will be disabled.
The output file path can be specified as:
- A string path (e.g.,
"model.vst"
) for a standard file path. - A
Path
object (e.g.,Path("model.vst")
) usingpathlib
. - A binary I/O object (e.g.,
BytesIO()
) for in-memory operations, if you don't want to save to a file.
Example:
import verysimpletransformers # or: from verysimpletransformers import to_vst
from pathlib import Path # can be used instead of a string
verysimpletransformers.to_vst(model,
"my_model.vst",
compression=5, # optional
)
Note: As an alias, bundle_model
can be used instead of to_vst
.
Loading
Loading a machine learning model from a .vst file is just as simple. Follow these steps:
Import from_vst
from simpletransformers
.
Specify the path to the .vst file you want to load. Just like with to_vst
, this path can be a str
, Path
or binary
IO.
Use the from_vst
function to load the model. Optionally, you can specify the device
parameter to set the device (
e.g., 'cpu' or 'cuda'). If not specified, the function will select a device based on availability.
Example:
from verysimpletransformers import from_vst
from pathlib import Path
fp = Path("model.vst") # can also be simply a string.
new_model = from_vst(fp,
device='cuda', # optional
)
You can now use the new_model
for various machine learning tasks.
Note: As an alias, load_model
can be used instead of to_vst
.
Additional Information
-
The
.vst
files can be transferred and used across different devices, regardless of where the model was originally trained. -
Ensure that you have the required dependencies, such as SimpleTransformers and PyTorch, installed to use these functions effectively.
Full Example
To see a full example of saving and loading a .vst
file,
see examples/basic.py
CLI
The VerySimpleTransformers CLI provides a convenient and versatile way to interact with and manage your machine learning models created with VerySimpleTransformers. It allows you to perform various actions using a command-line interface, making it easy to run, serve, and upgrade your models with ease.
Usage
You can use the CLI with either verysimpletransformers
or vst
interchangeably. Here are some basic usage examples:
-
Running a model interactively:
verysimpletransformers model.vst vst model.vst ./model.vst
-
Specifying an action when running:
verysimpletransformers <action> model.vst verysimpletransformers model.vst <action> vst <action> model.vst vst model.vst <action> ./model.vst <action> # if the file has execution rights
Available Actions
The following actions are available:
-
'run': Run the model interactively by typing prompts.
-
'serve': Start a simple HTTP server to serve model outputs. You can specify the following options:
--port <PORT>
: Specify the port number (default: 8000).--host <HOST>
: Specify the host (default: 'localhost').
-
'upgrade': Upgrade the metadata of a model to the latest version.
Example
Here's an example of starting a server for a classification model:
vst serve ./classification.vst
./classification.vst serve
For more examples, see examples/basic.sh
Notes
- A .vst file (e.g., 'model.vst') is required for most commands.
- You can specify
<action>
, which can be one of the available options mentioned above. - If you leave
<action>
empty, a dropdown menu will appear for you to select from. - You can use 'vst' or 'verysimpletransformers' interchangeably.
About the .vst file format
The .vst
file format is used to bundle machine learning models created with SimpleTransformers. Understanding its
structure is useful for working with these files effectively. The format consists of the following components:
-
Shebang Line: The first line of the file is a shebang (
#!/usr/bin/env verysimpletransformers
) that allows the file to be executed using./
. This line indicates that the file can be executed as a script. -
Fixed Information (16 bytes):
VST Protocol Version
(short, 2 bytes): Specifies the version of the VerySimpleTransformers Protocol used in the file. This allows for backwards compatibility with older versions of the file, which may have less metadata.Metadata Length
(short, 2 bytes): Indicates the length of the metadata section in bytes.Content Length
(long long, 8 bytes): Specifies the length of the content (actual model data) in bytes.
Explanation: The total size of the fixed information is 12 bytes according to the data types (2 bytes for
short
+ 2 bytes forshort
+ 8 bytes forlong long
). However, it sums up to 16 bytes due tostruct
padding. The padding aligns the data structure to meet memory alignment requirements and ensure efficient memory access. -
Metadata (Variable Length):
-
The next
Metadata Length
bytes contain the metadata, which can vary in length. The metadata is used to store information about the model and its compatibility with different versions of VerySimpleTransformers. The structure of the metadata can change between protocol versions. -
Checks are performed on this metadata to ensure compatibility and integrity. If the metadata is invalid or belongs to a newer protocol version, a warning is issued, and the loading process continues.
-
-
Model Content (Variable Length):
- The remaining bytes, equal to
Content Length
, contain the serialized model data. This is the actual (possibly compressed) machine learning model that was bundled into the.vst
file. - The model contents are stored (and loaded) using
dill
(which is an extension ofpickle
).
- The remaining bytes, equal to
┌───────────────────────────────────────┐
│#!/usr/bin/env verysimpletransformers\n├──────────► Shebang
├───┬───┬───────────┬───────────────────┤
│ │ │ │ │
│2b │2b │ 4 bytes │ content length │
.vst version ◄─┼── │ │ │ (n2) ├──────────► 16 bytes
│ │ │ (padding) │ │
meta header length (n1) ◄─┼───┼── │ │ 8 bytes │
│ │ │ │ │
├───┴───┴───────────┴───────────────────┤
│ │
│ │
│ │
│ Meta Header │
│ │
│ │
│ │
│ n1 bytes │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
├───────────────────────────────────────┤
│ │
│ │
│ (possible compressed) │
│ │
│ │
│ Machine Learning Model Dill │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ n2 bytes │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ .............. │
│ │
└───────────────────────────────────────┘
Converting a vst
file back into the original model files
If you want to retrieve the original model files from a saved vst file, you can use the dump_to_disk()
function:
from verysimpletransformers import dump_to_disk
vst_path = 'my_model.vst'
# Convert vst file to original model files
dump_to_disk(vst_path, # a running model or path to a .vst file
'model_files/', # optional, default is 'outputs/'
)
This will extract the model contents from the vst file and save the individual components in the model_files/
directory:
pytorch_model.bin
- Model weightstokenizer.json
- Tokenizer configurationvocab.txt
- Vocabulary- etc.
This allows you to easily retrieve the original model files from a vst save file in order to upload or share the model to e.g. huggingface.
Extra's
Drive
To enable integration with Google Drive, this extra uses the library drive-in to enable uploading and downloading models from Drive.
pip install verysimpletransformers[drive]
from verysimpletransformers.drive import to_drive, from_drive
model: ClassificationModel # or some other model
drive_url = to_drive(model)
# ...
new_model: ClassificationModel = from_drive(drive_url)
Limitations:
- Due to limitations with oauth2 (without using a private key, which is not feasible for an open source project),
you can only access files created by this app (i.e. uploaded with
to_drive
). - By default, you're redirected to https://oauth.trialandsuccess.nl/callback after authenticating your Google account.
This page runs oauth-echoer and simply returns the access token
from the query arguments.
If you want to change this endpoint, you can pass a
client_id
andredirect_uri
to the Drive functions. Doing so requires an oauth app withdrive.file
permissions.
License
verysimpletransformers
is distributed under the terms of the MIT license.
Changelog
See CHANGELOG.md
on GitHub
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file verysimpletransformers-0.4.2.tar.gz
.
File metadata
- Download URL: verysimpletransformers-0.4.2.tar.gz
- Upload date:
- Size: 29.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.25.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d08d5f1d9d1c8702bccff955c66d931a438768bff53f3c465ad70f2ad98c1ab |
|
MD5 | efd23570940d5562457079e29dd5dbfa |
|
BLAKE2b-256 | 6803df9faa0f08f2687bf95e1b483d69c19b5cb45f787a37896c018ba5cafe57 |
File details
Details for the file verysimpletransformers-0.4.2-py3-none-any.whl
.
File metadata
- Download URL: verysimpletransformers-0.4.2-py3-none-any.whl
- Upload date:
- Size: 26.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.25.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2fac4df5e663c017937ca45d2c12691cc0ba120aca30e4b7ad5be3a4f78e1728 |
|
MD5 | 92d125e43407279daed711d1e06953c4 |
|
BLAKE2b-256 | 24b0d053b1a06fb1846d79c93696e217c157a07e5af66d69a7fd46fa1f057579 |