Skip to main content

An open-source framework for multi-modality instruction fine-tuning

Project description

🤖 Multi-modal GPT

Train a multi-modal chatbot with visual and language instructions!

Based on the open-source multi-modal model OpenFlamingo, we create various visual instruction data with open datasets, including VQA, Image Captioning, Visual Reasoning, Text OCR, and Visual Dialogue. Additionally, we also train the language model component of OpenFlamingo using only language-only instruction data.

The joint training of visual and language instructions effectively improves the performance of the model!

Features

  • Support various vision and language instruction data
  • Parameter efficient fine-tuning with LoRA
  • Tuning vision and language at the same time, complement each other

Installaion

To install the package in an existing environment, run

git clone https://github.com/open-mmlab/Multimodal-GPT.git
pip install -r requirements.txt
pip install -e. -v

or create a new conda environment

conda env create -f environment.yml

Demo

  1. Download the pre-trained weights.

    Use this script for converting LLaMA weights to HuggingFace format.

    Download the OpenFlamingo pre-trained model from openflamingo/OpenFlamingo-9B

    Download our LoRA Weight from here

    Then place these models in checkpoints folders like this:

    checkpoints
    ├── llama-7b_hf
    │   ├── config.json
    │   ├── pytorch_model-00001-of-00002.bin
    │   ├── ......
    │   └── tokenizer.model
    ├── OpenFlamingo-9B
    │   └──checkpoint.pt
    ├──mmgpt-lora-v0-release.pt
    
    
  2. launch the gradio demo

    python chat_gradio_demo.py
    

Examples

Recipe:

image4

Travel plan:

image3

Movie:

image2

Famous person:

image

Fine-tuning

Prepare datasets

  1. A-OKVQA

    Download annotation from this link and unzip to data/aokvqa/annotations

    It also requires images from coco dataset which can be downloaded from here.

  2. COCO Caption

    Download from this link and unzip to data/coco

    It also requires images from coco dataset which can be downloaded from here.

  3. OCR VQA

    Download from this link and place in data/OCR_VQA/

  4. LlaVA

    Download from liuhaotian/LLaVA-Instruct-150K and place in data/llava/

    It also requires images from coco dataset which can be downloaded from here.

  5. Mini-GPT4

    Download from Vision-CAIR/cc_sbu_align and place in data/cc_sbu_align/

  6. Dolly 15k

    Download from databricks/databricks-dolly-15k and place it in data/dolly/databricks-dolly-15k.jsonl

  7. Alpaca GPT4

    Download it from this link and place it in data/alpaca_gpt4/alpaca_gpt4_data.json

You can also customize the data path in the configs/dataset_config.py.

Start training

torchrun --nproc_per_node=8 mmgpt/train/instruction_finetune.py \
--lm_path checkpoints/llama-7b_hf \
--tokenizer_path checkpoints/llama-7b_hf \
--pretrained_path checkpoints/OpenFlamingo-9B/checkpoint.pt \
--run_name train-my-gpt4 \
--learning_rate 1e-5 \
--lr_scheduler cosine \
--batch_size 1 \ 
--tuning_config configs/lora_config.py \
--dataset_config configs/dataset_config.py \
--report_to_wandb \

Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mmgpt-0.0.1.tar.gz (35.1 kB view hashes)

Uploaded Source

Built Distribution

mmgpt-0.0.1-py3-none-any.whl (49.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page