Skip to main content

A PyPI package for object detection using advanced vision models

Project description

Spatial Reasoning: www.spatial-reasoning.com

A powerful Python package for object detection using advanced vision and reasoning models, including OpenAI's models and Google's Gemini.

Example Results Comparison of detection results across different models - showing the superior performance of the advanced reasoning model

Features

  • Multiple Detection Models:

    • Advanced Reasoning Model (OpenAI) - Reasoning model that leverages tools and other foundation models to perform object detection
    • Vanilla Reasoning Model - Directly using a reasoning model to perform object detection
    • Vision Model - GroundingDino + SAM
    • Gemini Model (Google) - Fine-tuned LMM for object detection
  • Tool-Use Reasoning: Our advanced model uses innovative grid-based reasoning for precise object detection

    Internal Workings How the advanced reasoning model works under the hood - using grid cells for precise localization

  • Simple API: One function for all your detection needs

  • CLI Support: Command-line interface for quick testing

Installation

pip install spatial-reasoning

Or install from source:

git clone https://github.com/QasimWani/spatial-reasoning.git
cd spatial_reasoning
pip install -e .

Optional: Flash Attention (for better performance)

For improved performance with transformer models, you can optionally install Flash Attention:

pip install flash-attn --no-build-isolation

Note: Flash Attention requires CUDA development tools and must be compiled for your specific PyTorch/CUDA version. The package will work without it, just with slightly reduced performance.

Setup

Create a .env file in your project root:

# .env
OPENAI_API_KEY=your-openai-api-key-here
GEMINI_API_KEY=your-google-gemini-api-key-here

Get your API keys:

Quick Start

Python API

from spatial_reasoning import detect

# Detect objects in an image
result = detect(
    image_path="https://ix-cdn.b2e5.com/images/27094/27094_3063d356a3a54cc3859537fd23c5ba9d_1539205710.jpeg",  # or image-path
    object_of_interest="farthest scooter in the image",
    task_type="advanced_reasoning_model"
)

# Access results
bboxes = result['bboxs']
visualized_image = result['visualized_image']
print(f"Found {len(bboxes)} objects")

# Save the result
visualized_image.save("output.jpg")

Command Line

# Basic usage
spatial-reasoning --image-path "image.jpg" --object-of-interest "person"  # "advanced_reasoning_model" used by default

# With specific model
spatial-reasoning --image-path "image.jpg" --object-of-interest "cat" --task-type "gemini"

# From URL with custom parameters
vision-evals \
  --image-path "https://example.com/image.jpg" \
  --object-of-interest "text in image" \
  --task-type "advanced_reasoning_model" \
  --task-kwargs '{"nms_threshold": 0.7}'

Available Models

  • advanced_reasoning_model (default) - Best accuracy, uses tool-use reasoning
  • vanilla_reasoning_model - Faster, standard detection
  • vision_model - Uses GroundingDino + (optional) SAM2 for segmentation
  • gemini - Google's Gemini model

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spatial_reasoning-0.2.1.tar.gz (38.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spatial_reasoning-0.2.1-py3-none-any.whl (47.7 kB view details)

Uploaded Python 3

File details

Details for the file spatial_reasoning-0.2.1.tar.gz.

File metadata

  • Download URL: spatial_reasoning-0.2.1.tar.gz
  • Upload date:
  • Size: 38.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for spatial_reasoning-0.2.1.tar.gz
Algorithm Hash digest
SHA256 2052df707e9164cf7f63f9ff57a42546f36a9e98b67050fd2b05b698ce2b4b8d
MD5 88489376d27a8960ff43468cc83b7149
BLAKE2b-256 a510347f9e6950b29bdd25ec968211bad5b2b10069745275cbb7d4b3693ed316

See more details on using hashes here.

File details

Details for the file spatial_reasoning-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for spatial_reasoning-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5c016504c98ca2e796d7c56a83b90ada9518d981e6a9d5c1d7d9f7e9d108036d
MD5 0f68c2f49ab17cd46e7e51570744624f
BLAKE2b-256 534b32f7521b8eee50e1bb2951c4e06c1cc2620adf7a6c2e0af955b05e49b43b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page