No project description provided
Project description
🎉 Announcing Overeasy v.0.0.3
This repository contains code for Overeasy, a framework designed for constructing multi-step visual workflows. This library is engineered to simplify complex computer vision tasks by breaking them down into manageable, sequential operations. This methodology enhances the performance of models while boosting their interpretability.
Overeasy v0 focuses on object detection
and classification
.
Key Components
🧩 Workflows
A Workflow
defines a sequence of Agents that are executed in order, processing an image from input to final output upon execution. The output of each Agent is passed as an input to the next Agent in the Workflow. Workflows allow for the dynamic integration of various Agents
based on the task requirement.
Overeasy currently supports these models for use in Classification, Detection, LLMs, and Recognition tasks:
Classification
: CLIP, OpenCLIPBase, LaionCLIP, BiomedCLIPDetection
: YOLOworld, DETIC, GroundingDINOLLMs
: QwenVL, GPT, GPT4VisionRecognition
: TextractModel, RekognitionModel
🤖 Agents
Each Agent
encapsulates a specific task related to processing an input image. This modular approach enables the construction of complex Workflows, where each Agent contributes its specialized expertise at a specific stage of the visual processing pipeline.
Below is an overview of the Agents currently supported in our framework:
BoundingBoxSelectAgent
: This Agent is used to detect an object and select its bounding box from an image, as well as to crop the image to the detected box. It can also handle split detections, where multiple bounding boxes are found.VisionPromptAgent
: This Agent is used to generate a response to a given query based on the content of an image. It uses a model to process the image and the query.DenseCaptioningAgent
: This Agent is used to generate a detailed description of an image. It uses a model to process the image and generate the description.BinaryChoiceAgent
: This Agent is used to make a binary choice (yes or no) based on the content of an image. It uses a model to process the image and generate the response.ClassificationAgent
: This Agent is used to classify the content of an image into one of the specified classes. It uses a model to process the image and perform the classification.OCRAgent
: This Agent is used to extract text from an image. It uses a model to process the image and perform the OCR.FacialRecognitionAgent
: This Agent is used to recognize a face in an image. It uses a model to process the image and perform facial recognition.JSONAgent
: This Agent is used to generate a JSON response based on the content of an image. It uses a model to process the image and generate the response.JoinAgent
: This Agent is used to combine the results of multiple agents.
Example Output
Let’s walk through an example using Overeasy.
Say we’re interested in identifying the workers wearing hardhats in an input image.
Code
from overeasy import Workflow, BoundingBoxSelectAgent, BinaryChoiceAgent, JoinAgent, visualize_graph
from PIL import Image
import overeasy as ov
# Load input image
image_path = "./construction.jpg"
image = Image.open(image_path)
# Create a new Workflow
workflow = Workflow()
workflow.add_step(BoundingBoxSelectAgent(classes=["person"], split=True))
workflow.add_step(BinaryChoiceAgent("Is this person wearing a hardhat?"))
workflow.add_step(JoinAgent())
result, graph = workflow.execute(image)
ov.logging.print_summary()
fig = visualize_graph(graph)
fig.savefig("workflow_result.png")
Output
Analysis
Layer 0: Original Image
This is the original input image for our Workflow.
Layer 1: Person Detection
Layer 1 uses a BoundingBoxSelectAgent
to detect people in the original image, cropping out images of detected individuals. Associated with each image is a confidence score (between 0 to 1), which indicates how certain a model is that the cropped image contains a person. Here, a score of 0.48 indicates that the model is 48% certain the cropped image is a person.
Layer 2: Hardhat Detection
Layer 2 takes the output from Layer 1 and uses a BinaryChoiceAgent
to determine whether the detected individual is wearing a hardhat. Each cropped image now has a label of "yes" or "no" with associated confidence scores. In this case, the confidence scores are all 1.00, which indicates that the model is 100% certain of its decision on whether the person is wearing a hardhat.
Layer 3: Final Output
The Workflow uses a JoinAgent
to combine the results of the BoundingBoxSelectAgent and the BinaryChoiceAgent. The JoinAgent joins the predictions from the cropped images back into the original image.
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file overeasy-0.0.4.tar.gz
.
File metadata
- Download URL: overeasy-0.0.4.tar.gz
- Upload date:
- Size: 52.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc036fb94218d7df2e1a1baa4fecec1f19654e6dd9ff6b447a7add10fa06bcdf |
|
MD5 | aaf1c0b76d2d206a7a13154f974f23c7 |
|
BLAKE2b-256 | 2466cc71c64e84cdabe14f9c3947ff31ac5967fce54107ca4da234659cdc2c4b |