Skip to main content

YO-FLO: A proof-of-concept in using advanced vision models as a YOLO alternative.

Project description

YOFLO-CLI

YOFLO-CLI is a command-line interface for the YO-FLO package, providing advanced object detection and binary inference capabilities using the Florence-2 vision-language model in real-time. This tool leverages state-of-the-art vision models to perform tasks such as object detection and binary inference based on referring expression comprehension.

Features

  • Object Detection: Identify and classify objects within a video feed.
  • Binary Inference: Answer yes/no questions based on the visual input.
  • Inference Rate Calculation: Measure the rate of inferences per second.
  • Real-time Processing: Process video feeds from a webcam in real-time.
  • Screenshot on Detection: Automatically capture and save a screenshot when a target object is detected.
  • Logging Alerts: Log detection alerts to a file.
  • Headless Mode: Run the tool without displaying the video feed, suitable for server environments.
  • Pretty Print: Enable formatted output of detections for better readability.

Model Information

This tool uses Microsoft's Florence-2, a powerful vision-language model designed to understand and generate detailed descriptions of visual inputs. Florence-2 combines advanced image processing with natural language understanding, making it ideal for complex tasks that require both visual and textual analysis.

Installation

  1. Clone the repository: git clone https://github.com/yourusername/yoflo-cli.git cd yoflo-cli

  2. Install the package: pip install .

Usage

python yoflo.py --model_path /path/to/model --class_name "person" --phrase "Is the person smiling?" --debug --screenshot --log_to_file --headless --pretty_print

Command-line Arguments

  • --model_path: Path to the pre-trained model directory.
  • --class_name: Class name to detect (e.g., 'cat', 'dog').
  • --phrase: Yes/No question for expression comprehension (e.g., 'Is the person smiling?').
  • --debug: Enable debug mode.
  • --headless: Run in headless mode without displaying the video feed.
  • --screenshot: Enable screenshot on detection.
  • --log_to_file: Enable logging alerts to a file.
  • --inference_speed: Display inference speed.
  • --object_detection: Enable object detection.
  • --download_model: Download model from Hugging Face.
  • --pretty_print: Enable pretty print for detections.
  • --alert_on: Trigger alert on 'yes' or 'no' result (default: 'yes').

Core Functionality

The core functionality of YOFLO-CLI is encapsulated in the yoflo.py script. Here’s a high-level overview of its capabilities:

  • Model Initialization: Loads the Florence-2 model and processor.
  • Object Detection: Uses the model to detect objects in images and annotate them with bounding boxes and labels.
  • Expression Comprehension: Answers yes/no questions based on the visual content of the image.
  • Real-time Processing: Captures video frames from a webcam, processes them for object detection or binary inference, and displays the results.
  • Screenshot and Logging Alerts: Captures screenshots and logs alerts to a file upon detection of target objects.

Main Functions

  • init_model(): Initializes the model and processor from the specified path.
  • run_object_detection(): Performs object detection on a given image.
  • run_expression_comprehension(): Performs binary inference based on a given phrase.
  • start_webcam_detection(): Starts real-time processing of webcam feed.
  • stop_webcam_detection(): Stops the real-time processing of the webcam feed.
  • save_screenshot(): Saves a screenshot when a target object is detected.
  • log_alert(): Logs detection alerts to a file.
  • toggle_screenshot(): Toggles the screenshot feature on or off.
  • toggle_log_to_file(): Toggles the log to file feature on or off.
  • pretty_print_detections(): Formats and prints detection results with datetime for readability.

Development Status

YOFLO-CLI is currently in the process of being converted into a full Python package. We are starting small with just object detection and binary inference based on referring expression comprehension. Future updates will include optimizations and additional features as the project evolves.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

  • The Florence-2 model is developed by Microsoft and is available on the HuggingFace Model Hub.
  • This project uses several open-source libraries, including PyTorch, Transformers, OpenCV, Pillow, and NumPy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yoflo-0.1.0.tar.gz (9.3 kB view hashes)

Uploaded Source

Built Distribution

yoflo-0.1.0-py3-none-any.whl (8.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page