Skip to main content

An open-source framework for multi-modality instruction fine-tuning

Project description

🤖 Multi-modal GPT

Train a multi-modal chatbot with visual and language instructions!

Based on the open-source multi-modal model OpenFlamingo, we create various visual instruction data with open datasets, including VQA, Image Captioning, Visual Reasoning, Text OCR, and Visual Dialogue. Additionally, we also train the language model component of OpenFlamingo using only language-only instruction data.

The joint training of visual and language instructions effectively improves the performance of the model!

Features

  • Support various vision and language instruction data
  • Parameter efficient fine-tuning with LoRA
  • Tuning vision and language at the same time, complement each other

Installaion

To install the package in an existing environment, run

git clone https://github.com/open-mmlab/Multimodal-GPT.git
pip install -r requirements.txt
pip install -e. -v

or create a new conda environment

conda env create -f environment.yml

Demo

  1. Download the pre-trained weights.

    Use this script for converting LLaMA weights to HuggingFace format.

    Download the OpenFlamingo pre-trained model from openflamingo/OpenFlamingo-9B

    Download our LoRA Weight from here

    Then place these models in checkpoints folders like this:

    checkpoints
    ├── llama-7b_hf
    │   ├── config.json
    │   ├── pytorch_model-00001-of-00002.bin
    │   ├── ......
    │   └── tokenizer.model
    ├── OpenFlamingo-9B
    │   └──checkpoint.pt
    ├──mmgpt-lora-v0-release.pt
    
    
  2. launch the gradio demo

    python chat_gradio_demo.py
    

Examples

Recipe:

image4

Travel plan:

image3

Movie:

image2

Famous person:

image

Fine-tuning

Prepare datasets

  1. A-OKVQA

    Download annotation from this link and unzip to data/aokvqa/annotations

    It also requires images from coco dataset which can be downloaded from here.

  2. COCO Caption

    Download from this link and unzip to data/coco

    It also requires images from coco dataset which can be downloaded from here.

  3. OCR VQA

    Download from this link and place in data/OCR_VQA/

  4. LlaVA

    Download from liuhaotian/LLaVA-Instruct-150K and place in data/llava/

    It also requires images from coco dataset which can be downloaded from here.

  5. Mini-GPT4

    Download from Vision-CAIR/cc_sbu_align and place in data/cc_sbu_align/

  6. Dolly 15k

    Download from databricks/databricks-dolly-15k and place it in data/dolly/databricks-dolly-15k.jsonl

  7. Alpaca GPT4

    Download it from this link and place it in data/alpaca_gpt4/alpaca_gpt4_data.json

You can also customize the data path in the configs/dataset_config.py.

Start training

torchrun --nproc_per_node=8 mmgpt/train/instruction_finetune.py \
--lm_path checkpoints/llama-7b_hf \
--tokenizer_path checkpoints/llama-7b_hf \
--pretrained_path checkpoints/OpenFlamingo-9B/checkpoint.pt \
--run_name train-my-gpt4 \
--learning_rate 1e-5 \
--lr_scheduler cosine \
--batch_size 1 \ 
--tuning_config configs/lora_config.py \
--dataset_config configs/dataset_config.py \
--report_to_wandb \

Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mmgpt-0.0.1.tar.gz (35.1 kB view details)

Uploaded Source

Built Distribution

mmgpt-0.0.1-py3-none-any.whl (49.4 kB view details)

Uploaded Python 3

File details

Details for the file mmgpt-0.0.1.tar.gz.

File metadata

  • Download URL: mmgpt-0.0.1.tar.gz
  • Upload date:
  • Size: 35.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for mmgpt-0.0.1.tar.gz
Algorithm Hash digest
SHA256 83350144458406b550bfbaee76d221514d7fde106d39c4e62cd354e0ff3a6fa7
MD5 47fb8a0658f8827b1b55b9d6e03e0654
BLAKE2b-256 450270febd09c09cd1819b4962b1f666a3177651bc34c673f616b791adc496ca

See more details on using hashes here.

File details

Details for the file mmgpt-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: mmgpt-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 49.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for mmgpt-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f3d09a490b85ac5d61372a1350706cf9e525b61655118f1d775f4b8039050662
MD5 97d27045ce6bf14bb55df04318a1c7bb
BLAKE2b-256 9bdb928a76666ee9e8c2c0894af4212160ffeaf0a3a7d4acbf540ff3cc1b334f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page