An open-source framework for multi-modality instruction fine-tuning
Project description
🤖 Multi-modal GPT
Train a multi-modal chatbot with visual and language instructions!
Based on the open-source multi-modal model OpenFlamingo, we create various visual instruction data with open datasets, including VQA, Image Captioning, Visual Reasoning, Text OCR, and Visual Dialogue. Additionally, we also train the language model component of OpenFlamingo using only language-only instruction data.
The joint training of visual and language instructions effectively improves the performance of the model!
Features
- Support various vision and language instruction data
- Parameter efficient fine-tuning with LoRA
- Tuning vision and language at the same time, complement each other
Installaion
To install the package in an existing environment, run
git clone https://github.com/open-mmlab/Multimodal-GPT.git
pip install -r requirements.txt
pip install -e. -v
or create a new conda environment
conda env create -f environment.yml
Demo
-
Download the pre-trained weights.
Use this script for converting LLaMA weights to HuggingFace format.
Download the OpenFlamingo pre-trained model from openflamingo/OpenFlamingo-9B
Download our LoRA Weight from here
Then place these models in checkpoints folders like this:
checkpoints ├── llama-7b_hf │ ├── config.json │ ├── pytorch_model-00001-of-00002.bin │ ├── ...... │ └── tokenizer.model ├── OpenFlamingo-9B │ └──checkpoint.pt ├──mmgpt-lora-v0-release.pt
-
launch the gradio demo
python chat_gradio_demo.py
Examples
Recipe:
Travel plan:
Movie:
Famous person:
Fine-tuning
Prepare datasets
-
Download annotation from this link and unzip to
data/aokvqa/annotations
It also requires images from coco dataset which can be downloaded from here.
-
Download from this link and unzip to
data/coco
It also requires images from coco dataset which can be downloaded from here.
-
Download from this link and place in
data/OCR_VQA/
-
Download from liuhaotian/LLaVA-Instruct-150K and place in
data/llava/
It also requires images from coco dataset which can be downloaded from here.
-
Download from Vision-CAIR/cc_sbu_align and place in
data/cc_sbu_align/
-
Download from databricks/databricks-dolly-15k and place it in
data/dolly/databricks-dolly-15k.jsonl
-
Download it from this link and place it in
data/alpaca_gpt4/alpaca_gpt4_data.json
You can also customize the data path in the configs/dataset_config.py.
Start training
torchrun --nproc_per_node=8 mmgpt/train/instruction_finetune.py \
--lm_path checkpoints/llama-7b_hf \
--tokenizer_path checkpoints/llama-7b_hf \
--pretrained_path checkpoints/OpenFlamingo-9B/checkpoint.pt \
--run_name train-my-gpt4 \
--learning_rate 1e-5 \
--lr_scheduler cosine \
--batch_size 1 \
--tuning_config configs/lora_config.py \
--dataset_config configs/dataset_config.py \
--report_to_wandb \
Acknowledgements
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mmgpt-0.0.1.tar.gz
.
File metadata
- Download URL: mmgpt-0.0.1.tar.gz
- Upload date:
- Size: 35.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 83350144458406b550bfbaee76d221514d7fde106d39c4e62cd354e0ff3a6fa7 |
|
MD5 | 47fb8a0658f8827b1b55b9d6e03e0654 |
|
BLAKE2b-256 | 450270febd09c09cd1819b4962b1f666a3177651bc34c673f616b791adc496ca |
File details
Details for the file mmgpt-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: mmgpt-0.0.1-py3-none-any.whl
- Upload date:
- Size: 49.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f3d09a490b85ac5d61372a1350706cf9e525b61655118f1d775f4b8039050662 |
|
MD5 | 97d27045ce6bf14bb55df04318a1c7bb |
|
BLAKE2b-256 | 9bdb928a76666ee9e8c2c0894af4212160ffeaf0a3a7d4acbf540ff3cc1b334f |