Skip to main content

No project description provided

Project description

🎉 Announcing Overeasy v.0.0.3

This repository contains code for Overeasy, a framework designed for constructing multi-step visual workflows. This library is engineered to simplify complex computer vision tasks by breaking them down into manageable, sequential operations. This methodology enhances the performance of models while boosting their interpretability.

Overeasy v0 focuses on object detection and classification.

Key Components

🧩 Workflows

A Workflow defines a sequence of Agents that are executed in order, processing an image from input to final output upon execution. The output of each Agent is passed as an input to the next Agent in the Workflow. Workflows allow for the dynamic integration of various Agents based on the task requirement.

Overeasy currently supports these models for use in Classification, Detection, LLMs, and Recognition tasks:

  • Classification: CLIP, OpenCLIPBase, LaionCLIP, BiomedCLIP
  • Detection: YOLOworld, DETIC, GroundingDINO
  • LLMs: QwenVL, GPT, GPT4Vision
  • Recognition: TextractModel, RekognitionModel

🤖 Agents

Each Agent encapsulates a specific task related to processing an input image. This modular approach enables the construction of complex Workflows, where each Agent contributes its specialized expertise at a specific stage of the visual processing pipeline.

Below is an overview of the Agents currently supported in our framework:

  • BoundingBoxSelectAgent: This Agent is used to detect an object and select its bounding box from an image, as well as to crop the image to the detected box. It can also handle split detections, where multiple bounding boxes are found.
  • VisionPromptAgent: This Agent is used to generate a response to a given query based on the content of an image. It uses a model to process the image and the query.
  • DenseCaptioningAgent: This Agent is used to generate a detailed description of an image. It uses a model to process the image and generate the description.
  • BinaryChoiceAgent: This Agent is used to make a binary choice (yes or no) based on the content of an image. It uses a model to process the image and generate the response.
  • ClassificationAgent: This Agent is used to classify the content of an image into one of the specified classes. It uses a model to process the image and perform the classification.
  • OCRAgent: This Agent is used to extract text from an image. It uses a model to process the image and perform the OCR.
  • FacialRecognitionAgent: This Agent is used to recognize a face in an image. It uses a model to process the image and perform facial recognition.
  • JSONAgent: This Agent is used to generate a JSON response based on the content of an image. It uses a model to process the image and generate the response.
  • JoinAgent: This Agent is used to combine the results of multiple agents.

Example Output

Let’s walk through an example using Overeasy.

Say we’re interested in identifying the workers wearing hardhats in an input image.

Code

from overeasy import Workflow, BoundingBoxSelectAgent, BinaryChoiceAgent, JoinAgent, visualize_graph   
from PIL import Image
import overeasy as ov

# Load input image
image_path = "./construction.jpg"
image = Image.open(image_path)

# Create a new Workflow
workflow = Workflow()

workflow.add_step(BoundingBoxSelectAgent(classes=["person"], split=True))
workflow.add_step(BinaryChoiceAgent("Is this person wearing a hardhat?"))
workflow.add_step(JoinAgent())
result, graph = workflow.execute(image)

ov.logging.print_summary()

fig = visualize_graph(graph)
fig.savefig("workflow_result.png")

Output

Input Image

Analysis

Layer 0: Original Image

This is the original input image for our Workflow.

Layer 1: Person Detection

Layer 1 uses a BoundingBoxSelectAgent to detect people in the original image, cropping out images of detected individuals. Associated with each image is a confidence score (between 0 to 1), which indicates how certain a model is that the cropped image contains a person. Here, a score of 0.48 indicates that the model is 48% certain the cropped image is a person.

Layer 2: Hardhat Detection

Layer 2 takes the output from Layer 1 and uses a BinaryChoiceAgent to determine whether the detected individual is wearing a hardhat. Each cropped image now has a label of "yes" or "no" with associated confidence scores. In this case, the confidence scores are all 1.00, which indicates that the model is 100% certain of its decision on whether the person is wearing a hardhat.

Layer 3: Final Output

The Workflow uses a JoinAgent to combine the results of the BoundingBoxSelectAgent and the BinaryChoiceAgent. The JoinAgent joins the predictions from the cropped images back into the original image.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

overeasy-0.0.5.tar.gz (52.0 kB view details)

Uploaded Source

File details

Details for the file overeasy-0.0.5.tar.gz.

File metadata

  • Download URL: overeasy-0.0.5.tar.gz
  • Upload date:
  • Size: 52.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.4

File hashes

Hashes for overeasy-0.0.5.tar.gz
Algorithm Hash digest
SHA256 30565509651fb551c86d385659fe564e9c4dba452de7b2f2f6e2b4f3a253725d
MD5 153a7dc58737206e34409ac2a0e94fac
BLAKE2b-256 6f7b7d5a603129a714b77b4cf800128382614df491179c3edbc3d69492abf38b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page