Skip to main content

Visual Prompting for Large Multimodal Models (LMMs)

Project description

multimodal-maestro


version license python-version Gradio Colab

👋 hello

Multimodal-Maestro gives you more control over large multimodal models to get the outputs you want. With more effective prompting tactics, you can get multimodal models to do tasks you didn't know (or think!) were possible. Curious how it works? Try our HF space!

🚧 The project is still under construction and the API is prone to change.

💻 install

Pip install the supervision package in a 3.11>=Python>=3.8 environment.

pip install multimodal-maestro

🚀 examples

GPT-4 Vision

Find dog.

>>> The dog is prominently featured in the center of the image with the label [9].
👉 read more
  • load image

    import cv2
    
    image = cv2.imread("...")
    
  • create and refine marks

    import multimodalmaestro as mm
    
    generator = mm.SegmentAnythingMarkGenerator(device='cuda')
    marks = generator.generate(image=image)
    marks = mm.refine_marks(marks=marks)
    
  • visualize marks

    mark_visualizer = mm.MarkVisualizer()
    marked_image = mark_visualizer.visualize(image=image, marks=marks)
    

    image-vs-marked-image

  • prompt

    prompt = "Find dog."
    
    response = mm.prompt_image(api_key=api_key, image=marked_image, prompt=prompt)
    
    >>> "The dog is prominently featured in the center of the image with the label [9]."
    
  • extract related marks

    masks = mm.extract_relevant_masks(text=response, detections=refined_marks)
    
    >>> {'6': array([
    ...     [False, False, False, ..., False, False, False],
    ...     [False, False, False, ..., False, False, False],
    ...     [False, False, False, ..., False, False, False],
    ...     ...,
    ...     [ True,  True,  True, ..., False, False, False],
    ...     [ True,  True,  True, ..., False, False, False],
    ...     [ True,  True,  True, ..., False, False, False]])
    ... }
    

multimodal-maestro

🚧 roadmap

  • Documentation page.
  • Segment Anything guided marks generation.
  • Non-Max Suppression marks refinement.
  • LLaVA demo.

💜 acknowledgement

🦸 contribution

We would love your help in making this repository even better! If you noticed any bug, or if you have any suggestions for improvement, feel free to open an issue or submit a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multimodal_maestro-0.1.0.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

multimodal_maestro-0.1.0-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file multimodal_maestro-0.1.0.tar.gz.

File metadata

  • Download URL: multimodal_maestro-0.1.0.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.13 Darwin/23.0.0

File hashes

Hashes for multimodal_maestro-0.1.0.tar.gz
Algorithm Hash digest
SHA256 799bea920f212f8ba1aeb42ca7473efae1ebff4c6185b1d7593c0887c2a61252
MD5 71f4bf2b90261129a5592260b6184b93
BLAKE2b-256 06096ef2e07b42b9659cd62e82b07e22e8ed3a26c5884cd843b581715a9cdb78

See more details on using hashes here.

File details

Details for the file multimodal_maestro-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for multimodal_maestro-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 10836ab4608a4e32108b8cc174352b6ab66b816711dd55fd02a56ad68ecbf40b
MD5 5f86e18a8040043510dfab04ba3855fc
BLAKE2b-256 c0e319463106d1c459fb981418dc71b7f2c19ecbd799915bb1047db14425457e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page