Visual Prompting for Large Multimodal Models (LMMs)
Project description
maestro
coming: when it's ready...
👋 hello
maestro is a tool designed to streamline and accelerate the fine-tuning process for multimodal models. It provides ready-to-use recipes for fine-tuning popular vision-language models (VLMs) such as Florence-2, PaliGemma, and Phi-3.5 Vision on downstream vision-language tasks.
💻 install
Pip install the supervision package in a Python>=3.8 environment.
pip install maestro
🔥 quickstart
CLI
VLMs can be fine-tuned on downstream tasks directly from the command line with
maestro
command:
maestro florence2 train --dataset='<DATASET_PATH>' --epochs=10 --batch-size=8
SDK
Alternatively, you can fine-tune VLMs using the Python SDK, which accepts the same arguments as the CLI example above:
from maestro.trainer.common import MeanAveragePrecisionMetric
from maestro.trainer.models.florence_2 import train, TrainingConfiguration
config = TrainingConfiguration(
dataset='<DATASET_PATH>',
epochs=10,
batch_size=8,
metrics=[MeanAveragePrecisionMetric()]
)
train(config)
🦸 contribution
We would love your help in making this repository even better! We are especially looking for contributors with experience in fine-tuning vision-language models (VLMs). If you notice any bugs or have suggestions for improvement, feel free to open an issue or submit a pull request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for maestro-0.2.0rc3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 27a70b40b6d3d88058c6615d41d4c918881beb5076b90fdb7e796b5eaa6b7643 |
|
MD5 | cb019fb69e96f8322012335581722ccd |
|
BLAKE2b-256 | 2a6fcc4e3d9d1bcc7c66a3f34f1921f54eea1c66030cc7bc99fda731d0ca36f5 |