Foundation Model Active Learning for autonomous robot object discovery
Project description
cane-robotics
Foundation Model Active Learning (FMAL) for autonomous robot object discovery.
Fuses three vision-language foundation models -- GroundingDINO, DINO, and CLIP -- into a unified acquisition function for active learning. The system enables robots to efficiently discover and learn novel objects in unstructured environments with minimal human annotation.
Install
pip install cane-robotics
Quick Start
# Run a single active learning experiment
cane-robotics run --images-dir data/images --labels-dir data/labels --classes box laptop chair
# Run all ablation variants across multiple seeds
cane-robotics ablations --images-dir data/images --labels-dir data/labels
# Evaluate sim-to-real transfer
cane-robotics sim2real --synthetic-dir data/synthetic --real-dir data/real
# Launch annotation GUI
cane-robotics annotate novel_detections/
# Plot experiment results
cane-robotics plot results/
# Generate synthetic training data (Isaac Sim)
cane-robotics generate --output-dir data/synthetic --num-scenes 50
How It Works
The active learning pipeline scores candidate object detections using three complementary signals:
- GroundingDINO -- open-vocabulary detection confidence
- DINO ViT -- class-agnostic attention saliency (filters background clutter)
- CLIP -- semantic novelty relative to known object classes
These are combined into a unified acquisition score:
score(x) = 0.5 * conf_gdino + 0.3 * attn_dino + 0.2 * sim_fg - 0.2 * sim_bg
A temporal deduplication module tracks previously queried objects via embedding similarity, reducing redundant annotation queries by ~69%.
Each round, the top-scoring proposals are labeled (by human or oracle), added to the training set, and a YOLOv8 detector is retrained. The loop repeats until convergence.
Package Structure
cane_robotics/
pipeline/ Core active learning pipeline, offline replay, ROS node
models/ Foundation model wrappers (GDINO, CLIP, DINO, dedup)
dataset/ Dataset management and augmentation
config/ Experiment configuration (dataclasses + YAML)
experiments/ Experiment runners, ablations, sim2real evaluation
training/ YOLO training and dataset preparation
sim/ Isaac Sim synthetic data generation
tools/ Annotation GUI, result plotting
Python API
from cane_robotics import (
ActiveLearningPipeline,
create_gdino_pipeline,
ExperimentConfig,
DatasetManager,
TemporalDeduplicator,
)
# Create pipeline with full multi-VLM acquisition
pipeline = create_gdino_pipeline(
known_classes=["mug", "bowl", "can"],
acquisition_type="full",
enable_dedup=True,
)
# Process a single image
result = pipeline.process_image("frame_001.jpg")
for obj in result["novel_objects"]:
print(f"{obj['label']} (score={obj['score']:.3f})")
Ablation Variants
The experiment framework supports 8 acquisition function variants for systematic comparison:
| Variant | Description |
|---|---|
full |
All three VLM signals combined (default) |
random |
Random scoring baseline |
gdino_only |
GroundingDINO confidence only |
clip_only |
CLIP novelty signal only |
dino_only |
DINO attention only |
no_fg_bg_gate |
Full formula without foreground/background gating |
no_dedup |
Full scoring with deduplication disabled |
no_sam |
Full scoring with SAM splitting disabled |
Dependencies
Core: numpy, pyyaml, torch, torchvision, ultralytics, opencv-python, Pillow, transformers
Optional:
[sim]-- Isaac Sim for synthetic data generation[dev]-- pytest, ruff for development
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cane_robotics-0.1.0.tar.gz.
File metadata
- Download URL: cane_robotics-0.1.0.tar.gz
- Upload date:
- Size: 59.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9701de8dcb3a1222208d38eb8f732fc4e1155a7e1d842ef960801a2624ba838c
|
|
| MD5 |
239158e594a6d12602e3008d7bf4c778
|
|
| BLAKE2b-256 |
c0a18a34591e14c9608c221d79960ce791c3dc36421e4c8c1aa03843eefeb320
|
File details
Details for the file cane_robotics-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cane_robotics-0.1.0-py3-none-any.whl
- Upload date:
- Size: 73.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3cb70e3b0158ed2a2751b617f95dd08714f752acd3e513078d43795fc1190ea7
|
|
| MD5 |
be52cfad01b02c298542a7cda1a543a4
|
|
| BLAKE2b-256 |
4682501f814a737b01a226684506ca7fb000e68f0f3dd7ac84ff5ece65bb3a4e
|