Oculus Vision-Language Model - Inference SDK for multimodal AI research
Project description
Oceanir
Oculus Vision-Language Model SDK for multimodal AI research.
Installation
pip install oceanir
Quick Start
from oceanir import Oculus
# Load the model
model = Oculus.from_pretrained("OceanirAI/Oculus-0.1-Instruct")
# Visual Question Answering
answer = model.ask("photo.jpg", "What is the person doing?")
print(answer) # "The person is riding a bicycle."
# Image Captioning
caption = model.caption("photo.jpg")
print(caption) # "A dog playing in the park with a frisbee."
# Object Detection
results = model.detect("photo.jpg")
for box, label, conf in zip(results['boxes'], results['labels'], results['confidences']):
print(f"{label}: {conf:.2f}")
# Counting Objects
count = model.count("crowd.jpg", "people")
print(f"Found {count} people")
Models
| Model | Description |
|---|---|
OceanirAI/Oculus-0.1-Instruct |
Instruction-tuned for general VQA and captioning |
OceanirAI/Oculus-0.1-Reasoning |
Enhanced with chain-of-thought reasoning |
Reasoning Mode
Enable thinking traces for complex questions:
# With reasoning
answer = model.ask(
"complex_scene.jpg",
"How many red cars are parked on the left side?",
think=True
)
Features
- Visual Question Answering (VQA) - Answer questions about images
- Image Captioning - Generate natural language descriptions
- Object Detection - Detect and localize objects with bounding boxes
- Object Counting - Count specific objects in images
- Semantic Segmentation - Pixel-level scene understanding
- Chain-of-Thought Reasoning - Step-by-step reasoning for complex tasks
Architecture
Oculus combines:
- DINOv2 - Self-supervised vision transformer for semantic understanding
- SigLIP - Vision-language alignment for text understanding
- Trained Projector - Maps vision features to language space
- BLIP - Language model for text generation
License
This software is released under the Oceanir Research License.
Permitted Uses:
- Academic research
- Educational purposes
- Publishing papers with results
- Personal experimentation
Prohibited Uses:
- Commercial applications
- Training commercial models
- Integration into commercial products
For commercial licensing, contact: licensing@oceanir.ai
Citation
If you use Oceanir in your research, please cite:
@software{oculus2026,
title={Oculus Vision-Language Model},
author={OceanirAI},
year={2026},
url={https://github.com/OceanirAI/oceanir}
}
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file oceanir-0.1.0.tar.gz.
File metadata
- Download URL: oceanir-0.1.0.tar.gz
- Upload date:
- Size: 13.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d341efc6ce54901e61af53d5c66935c1465c3c93de33f209fff8b78e87a0c99
|
|
| MD5 |
4e0d42ae7a9dbca16de91a79a636f8c0
|
|
| BLAKE2b-256 |
835812c318def704d23df2be85b02e7bee0d19705cd29a14e21b23b70c41ecfb
|
File details
Details for the file oceanir-0.1.0-py3-none-any.whl.
File metadata
- Download URL: oceanir-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f5a0455e261f9c5f798fd03cd5024aa3420813492833413ca245432fed12651
|
|
| MD5 |
65a63fa7391583079f52df92c4313775
|
|
| BLAKE2b-256 |
4375658970af8590d1a259afd216d5051497a2578357aa37b72df611bbab7f9a
|