A PyPI package for object detection using advanced vision models
Project description
Spatial Reasoning
A powerful Python package for object detection using advanced vision and reasoning models, including OpenAI's models and Google's Gemini.
Comparison of detection results across different models - showing the superior performance of the advanced reasoning model
Features
-
Multiple Detection Models:
- Advanced Reasoning Model (OpenAI) - Reasoning model that leverages tools and other foundation models to perform object detection
- Vanilla Reasoning Model - Directly using a reasoning model to perform object detection
- Vision Model - GroundingDino + SAM
- Gemini Model (Google) - Fine-tuned LMM for object detection
-
Tool-Use Reasoning: Our advanced model uses innovative grid-based reasoning for precise object detection
How the advanced reasoning model works under the hood - using grid cells for precise localization
-
Simple API: One function for all your detection needs
-
CLI Support: Command-line interface for quick testing
Installation
pip install spatial-reasoning
Or install from source:
git clone https://github.com/QasimWani/spatial-reasoning.git
cd spatial_reasoning
pip install -e .
Optional: Flash Attention (for better performance)
For improved performance with transformer models, you can optionally install Flash Attention:
pip install flash-attn --no-build-isolation
Note: Flash Attention requires CUDA development tools and must be compiled for your specific PyTorch/CUDA version. The package will work without it, just with slightly reduced performance.
Setup
Create a .env file in your project root:
# .env
OPENAI_API_KEY=your-openai-api-key-here
GEMINI_API_KEY=your-google-gemini-api-key-here
Get your API keys:
Quick Start
Python API
from spatial_reasoning import detect
# Detect objects in an image
result = detect(
image_path="https://ix-cdn.b2e5.com/images/27094/27094_3063d356a3a54cc3859537fd23c5ba9d_1539205710.jpeg", # or image-path
object_of_interest="farthest scooter in the image",
task_type="advanced_reasoning_model"
)
# Access results
bboxes = result['bboxs']
visualized_image = result['visualized_image']
print(f"Found {len(bboxes)} objects")
# Save the result
visualized_image.save("output.jpg")
Command Line
# Basic usage
spatial-reasoning --image-path "image.jpg" --object-of-interest "person" # "advanced_reasoning_model" used by default
# With specific model
spatial-reasoning --image-path "image.jpg" --object-of-interest "cat" --task-type "gemini"
# From URL with custom parameters
vision-evals \
--image-path "https://example.com/image.jpg" \
--object-of-interest "text in image" \
--task-type "advanced_reasoning_model" \
--task-kwargs '{"nms_threshold": 0.7}'
Available Models
advanced_reasoning_model(default) - Best accuracy, uses tool-use reasoningvanilla_reasoning_model- Faster, standard detectionvision_model- Uses GroundingDino + (optional) SAM2 for segmentationgemini- Google's Gemini model
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spatial_reasoning-0.1.5.tar.gz.
File metadata
- Download URL: spatial_reasoning-0.1.5.tar.gz
- Upload date:
- Size: 38.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2114c728e00032c475c7f323dda2075404f68eca84f7586617dc7de3108a5d26
|
|
| MD5 |
79d714cee865384af7eeb2dd901315d7
|
|
| BLAKE2b-256 |
82ae62393deefa8c1f948d22fd07756352f4abd8efc86ceeaf2c8da3740625df
|
File details
Details for the file spatial_reasoning-0.1.5-py3-none-any.whl.
File metadata
- Download URL: spatial_reasoning-0.1.5-py3-none-any.whl
- Upload date:
- Size: 47.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f5f63073f81e7d463872ad194f1ba04b8d373b3f53788243752f03973b2e967
|
|
| MD5 |
fc4a2ede007fc787f1e11bff93959b40
|
|
| BLAKE2b-256 |
e6412f9db5dd863dec6b9d5048af7790d41543d5adac5d2cf8313b06a3d64a06
|