No project description provided

These details have not been verified by PyPI

Project description

🎉 Announcing Overeasy v.0.0.3

This repository contains code for Overeasy, a framework designed for constructing multi-step visual workflows. This library is engineered to simplify complex computer vision tasks by breaking them down into manageable, sequential operations. This methodology enhances the performance of models while boosting their interpretability.

Overeasy v0 focuses on object detection and classification.

Key Components

🧩 Workflows

A Workflow defines a sequence of Agents that are executed in order, processing an image from input to final output upon execution. The output of each Agent is passed as an input to the next Agent in the Workflow. Workflows allow for the dynamic integration of various Agents based on the task requirement.

Overeasy currently supports these models for use in Classification, Detection, LLMs, and Recognition tasks:

Classification: CLIP, OpenCLIPBase, LaionCLIP, BiomedCLIP
Detection: YOLOworld, DETIC, GroundingDINO
LLMs: QwenVL, GPT, GPT4Vision
Recognition: TextractModel, RekognitionModel

🤖 Agents

Each Agent encapsulates a specific task related to processing an input image. This modular approach enables the construction of complex Workflows, where each Agent contributes its specialized expertise at a specific stage of the visual processing pipeline.

Below is an overview of the Agents currently supported in our framework:

BoundingBoxSelectAgent: This Agent is used to detect an object and select its bounding box from an image, as well as to crop the image to the detected box. It can also handle split detections, where multiple bounding boxes are found.
VisionPromptAgent: This Agent is used to generate a response to a given query based on the content of an image. It uses a model to process the image and the query.
DenseCaptioningAgent: This Agent is used to generate a detailed description of an image. It uses a model to process the image and generate the description.
BinaryChoiceAgent: This Agent is used to make a binary choice (yes or no) based on the content of an image. It uses a model to process the image and generate the response.
ClassificationAgent: This Agent is used to classify the content of an image into one of the specified classes. It uses a model to process the image and perform the classification.
OCRAgent: This Agent is used to extract text from an image. It uses a model to process the image and perform the OCR.
FacialRecognitionAgent: This Agent is used to recognize a face in an image. It uses a model to process the image and perform facial recognition.
JSONAgent: This Agent is used to generate a JSON response based on the content of an image. It uses a model to process the image and generate the response.
JoinAgent: This Agent is used to combine the results of multiple agents.

Example Output

Let’s walk through an example using Overeasy.

Say we’re interested in identifying the workers wearing hardhats in an input image.

Code

from overeasy import Workflow, BoundingBoxSelectAgent, BinaryChoiceAgent, JoinAgent, visualize_graph   
from PIL import Image
import overeasy as ov

# Load input image
image_path = "./construction.jpg"
image = Image.open(image_path)

# Create a new Workflow
workflow = Workflow()

workflow.add_step(BoundingBoxSelectAgent(classes=["person"], split=True))
workflow.add_step(BinaryChoiceAgent("Is this person wearing a hardhat?"))
workflow.add_step(JoinAgent())
result, graph = workflow.execute(image)

ov.logging.print_summary()

fig = visualize_graph(graph)
fig.savefig("workflow_result.png")

Output

Analysis

Layer 0: Original Image

This is the original input image for our Workflow.

Layer 1: Person Detection

Layer 1 uses a BoundingBoxSelectAgent to detect people in the original image, cropping out images of detected individuals. Associated with each image is a confidence score (between 0 to 1), which indicates how certain a model is that the cropped image contains a person. Here, a score of 0.48 indicates that the model is 48% certain the cropped image is a person.

Layer 2: Hardhat Detection

Layer 2 takes the output from Layer 1 and uses a BinaryChoiceAgent to determine whether the detected individual is wearing a hardhat. Each cropped image now has a label of "yes" or "no" with associated confidence scores. In this case, the confidence scores are all 1.00, which indicates that the model is 100% certain of its decision on whether the person is wearing a hardhat.

Layer 3: Final Output

The Workflow uses a JoinAgent to combine the results of the BoundingBoxSelectAgent and the BinaryChoiceAgent. The JoinAgent joins the predictions from the cropped images back into the original image.

License

MIT License

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.16

Aug 20, 2024

0.2.15

Aug 3, 2024

0.2.14

Jul 31, 2024

0.2.13

Jul 30, 2024

0.2.12

Jul 30, 2024

0.2.11

Jul 25, 2024

0.2.10

Jul 16, 2024

0.2.9

Jul 11, 2024

0.2.8

Jul 11, 2024

0.2.7

Jul 10, 2024

0.2.6

Jul 9, 2024

0.2.5

Jul 2, 2024

0.2.4

Jul 2, 2024

0.2.3

Jun 29, 2024

0.2.2

Jun 29, 2024

0.2.1

Jun 28, 2024

0.2.0

Jun 28, 2024

0.1.6

Jun 24, 2024

0.1.5

Jun 23, 2024

0.1.4

Jun 22, 2024

0.1.3

Jun 19, 2024

0.1.2

Jun 19, 2024

0.1.1

Jun 19, 2024

0.1.0

Jun 14, 2024

0.0.8

Jun 13, 2024

0.0.7

Jun 13, 2024

0.0.5

Jun 12, 2024

This version

0.0.4

Jun 12, 2024

0.0.3

Jun 11, 2024

0.0.2

Jun 6, 2024

0.0.1

May 10, 2024

0.0.0

Jun 6, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

overeasy-0.0.4.tar.gz (52.0 kB view details)

Uploaded Jun 12, 2024 Source

File details

Details for the file overeasy-0.0.4.tar.gz.

File metadata

Download URL: overeasy-0.0.4.tar.gz
Upload date: Jun 12, 2024
Size: 52.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/5.0.0 CPython/3.12.4

File hashes

Hashes for overeasy-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`dc036fb94218d7df2e1a1baa4fecec1f19654e6dd9ff6b447a7add10fa06bcdf`
MD5	`aaf1c0b76d2d206a7a13154f974f23c7`
BLAKE2b-256	`2466cc71c64e84cdabe14f9c3947ff31ac5967fce54107ca4da234659cdc2c4b`