Skip to main content

Foundation Model Active Learning for autonomous robot object discovery

Project description

cane-robotics

Foundation Model Active Learning (FMAL) for autonomous robot object discovery.

Fuses three vision-language foundation models -- GroundingDINO, DINO, and CLIP -- into a unified acquisition function for active learning. The system enables robots to efficiently discover and learn novel objects in unstructured environments with minimal human annotation.

Install

pip install cane-robotics

Quick Start

# Run a single active learning experiment
cane-robotics run --images-dir data/images --labels-dir data/labels --classes box laptop chair

# Run all ablation variants across multiple seeds
cane-robotics ablations --images-dir data/images --labels-dir data/labels

# Evaluate sim-to-real transfer
cane-robotics sim2real --synthetic-dir data/synthetic --real-dir data/real

# Launch annotation GUI
cane-robotics annotate novel_detections/

# Plot experiment results
cane-robotics plot results/

# Generate synthetic training data (Isaac Sim)
cane-robotics generate --output-dir data/synthetic --num-scenes 50

How It Works

The active learning pipeline scores candidate object detections using three complementary signals:

  1. GroundingDINO -- open-vocabulary detection confidence
  2. DINO ViT -- class-agnostic attention saliency (filters background clutter)
  3. CLIP -- semantic novelty relative to known object classes

These are combined into a unified acquisition score:

score(x) = 0.5 * conf_gdino + 0.3 * attn_dino + 0.2 * sim_fg - 0.2 * sim_bg

A temporal deduplication module tracks previously queried objects via embedding similarity, reducing redundant annotation queries by ~69%.

Each round, the top-scoring proposals are labeled (by human or oracle), added to the training set, and a YOLOv8 detector is retrained. The loop repeats until convergence.

Package Structure

cane_robotics/
  pipeline/        Core active learning pipeline, offline replay, ROS node
  models/          Foundation model wrappers (GDINO, CLIP, DINO, dedup)
  dataset/         Dataset management and augmentation
  config/          Experiment configuration (dataclasses + YAML)
  experiments/     Experiment runners, ablations, sim2real evaluation
  training/        YOLO training and dataset preparation
  sim/             Isaac Sim synthetic data generation
  tools/           Annotation GUI, result plotting

Python API

from cane_robotics import (
    ActiveLearningPipeline,
    create_gdino_pipeline,
    ExperimentConfig,
    DatasetManager,
    TemporalDeduplicator,
)

# Create pipeline with full multi-VLM acquisition
pipeline = create_gdino_pipeline(
    known_classes=["mug", "bowl", "can"],
    acquisition_type="full",
    enable_dedup=True,
)

# Process a single image
result = pipeline.process_image("frame_001.jpg")
for obj in result["novel_objects"]:
    print(f"{obj['label']} (score={obj['score']:.3f})")

Ablation Variants

The experiment framework supports 8 acquisition function variants for systematic comparison:

Variant Description
full All three VLM signals combined (default)
random Random scoring baseline
gdino_only GroundingDINO confidence only
clip_only CLIP novelty signal only
dino_only DINO attention only
no_fg_bg_gate Full formula without foreground/background gating
no_dedup Full scoring with deduplication disabled
no_sam Full scoring with SAM splitting disabled

Dependencies

Core: numpy, pyyaml, torch, torchvision, ultralytics, opencv-python, Pillow, transformers

Optional:

  • [sim] -- Isaac Sim for synthetic data generation
  • [dev] -- pytest, ruff for development

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cane_robotics-0.1.0.tar.gz (59.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cane_robotics-0.1.0-py3-none-any.whl (73.6 kB view details)

Uploaded Python 3

File details

Details for the file cane_robotics-0.1.0.tar.gz.

File metadata

  • Download URL: cane_robotics-0.1.0.tar.gz
  • Upload date:
  • Size: 59.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for cane_robotics-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9701de8dcb3a1222208d38eb8f732fc4e1155a7e1d842ef960801a2624ba838c
MD5 239158e594a6d12602e3008d7bf4c778
BLAKE2b-256 c0a18a34591e14c9608c221d79960ce791c3dc36421e4c8c1aa03843eefeb320

See more details on using hashes here.

File details

Details for the file cane_robotics-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: cane_robotics-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 73.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for cane_robotics-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3cb70e3b0158ed2a2751b617f95dd08714f752acd3e513078d43795fc1190ea7
MD5 be52cfad01b02c298542a7cda1a543a4
BLAKE2b-256 4682501f814a737b01a226684506ca7fb000e68f0f3dd7ac84ff5ece65bb3a4e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page