Skip to main content

Toolset for Vision Agent

Project description

🔍 Vision Agent

ci_status PyPI version version

Vision Agent is a library for that helps you to use multimodal models to organize and structure your image data. Check out our discord for roadmaps and updates!

One of the problems of dealing with image data is it can be difficult to organize and search. For example, you might have a bunch of pictures of houses and want to count how many yellow houses you have, or how many houses with adobe roofs. The vision agent library uses LMMs to help create tags or descriptions of images to allow you to search over them, or use them in a database to carry out other operations.

Getting Started

LMMs

To get started, you can use an LMM to start generating text from images. The following code will use the LLaVA-1.6 34B model to generate a description of the image you pass it.

import vision_agent as va

model = va.lmm.get_lmm("llava")
model.generate("Describe this image", "image.png")
>>> "A yellow house with a green lawn."

WARNING We are hosting the LLaVA-1.6 34B model, if it times out please wait ~3-5 min for the server to warm up as it shuts down when usage is low.

DataStore

You can use the DataStore class to store your images, add new metadata to them such as descriptions, and search over different columns.

import vision_agent as va
import pandas as pd

df = pd.DataFrame({"image_paths": ["image1.png", "image2.png", "image3.png"]})
ds = va.data.DataStore(df)
ds = ds.add_lmm(va.lmm.get_lmm("llava"))
ds = ds.add_embedder(va.emb.get_embedder("sentence-transformer"))

ds = ds.add_column("descriptions", "Describe this image.")

This will use the prompt you passed, "Describe this image.", and the LMM to create a new column of descriptions for your image. Your data will now contain a new column with the descriptions of each image:

image_paths image_id descriptions
image1.png 1 "A yellow house with a green lawn."
image2.png 2 "A white house with a two door garage."
image3.png 3 "A wooden house in the middle of the forest."

You can now create an index on the descriptions column and search over it to find images that match your query.

ds = ds.build_index("descriptions")
ds.search("A yellow house.", top_k=1)
>>> [{'image_paths': 'image1.png', 'image_id': 1, 'descriptions': 'A yellow house with a green lawn.'}]

You can also create other columns for you data such as is_yellow:

ds = ds.add_column("is_yellow", "Is the house in this image yellow? Please answer yes or no.")

which would give you a dataset similar to this:

image_paths image_id descriptions is_yellow
image1.png 1 "A yellow house with a green lawn." "yes"
image2.png 2 "A white house with a two door garage." "no"
image3.png 3 "A wooden house in the middle of the forest." "no"

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vision_agent-0.0.20.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

vision_agent-0.0.20-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file vision_agent-0.0.20.tar.gz.

File metadata

  • Download URL: vision_agent-0.0.20.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.11 Linux/6.5.0-1015-azure

File hashes

Hashes for vision_agent-0.0.20.tar.gz
Algorithm Hash digest
SHA256 f90b0eb3ee77a2d0ea565cb1ee1f96cb7e6d9e1ed2fe2ecf4f9773ee1c04e8e5
MD5 a2ab917a124c49692058dc7ed1be531d
BLAKE2b-256 d0e78ec6c35838ab358c28a2f7d9d14768c33a845e1671d566f5836531d71441

See more details on using hashes here.

File details

Details for the file vision_agent-0.0.20-py3-none-any.whl.

File metadata

  • Download URL: vision_agent-0.0.20-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.11 Linux/6.5.0-1015-azure

File hashes

Hashes for vision_agent-0.0.20-py3-none-any.whl
Algorithm Hash digest
SHA256 ba0a09d8847f1fb8e3f9c39ed3e69f6b4d8caab5f7f1557688a1cd15693e2212
MD5 ce19b7a51f1e8c80c87e0a5fc5c1b80e
BLAKE2b-256 000ee1494d56ed0f16569d94edda80922aa14e5d87285126feee2d0b0689bdf3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page