Foundation Model Active Learning for autonomous robot object discovery

These details have not been verified by PyPI

Project links

Project description

cane-robotics

Foundation Model Active Learning (FMAL) for autonomous robot object discovery.

Fuses three vision-language foundation models -- GroundingDINO, DINO, and CLIP -- into a unified acquisition function for active learning. The system enables robots to efficiently discover and learn novel objects in unstructured environments with minimal human annotation.

Install

pip install cane-robotics

Quick Start

# Run a single active learning experiment
cane-robotics run --images-dir data/images --labels-dir data/labels --classes box laptop chair

# Run all ablation variants across multiple seeds
cane-robotics ablations --images-dir data/images --labels-dir data/labels

# Evaluate sim-to-real transfer
cane-robotics sim2real --synthetic-dir data/synthetic --real-dir data/real

# Launch annotation GUI
cane-robotics annotate novel_detections/

# Plot experiment results
cane-robotics plot results/

# Generate synthetic training data (Isaac Sim)
cane-robotics generate --output-dir data/synthetic --num-scenes 50

How It Works

The active learning pipeline scores candidate object detections using three complementary signals:

GroundingDINO -- open-vocabulary detection confidence
DINO ViT -- class-agnostic attention saliency (filters background clutter)
CLIP -- semantic novelty relative to known object classes

These are combined into a unified acquisition score:

score(x) = 0.5 * conf_gdino + 0.3 * attn_dino + 0.2 * sim_fg - 0.2 * sim_bg

A temporal deduplication module tracks previously queried objects via embedding similarity, reducing redundant annotation queries by ~69%.

Each round, the top-scoring proposals are labeled (by human or oracle), added to the training set, and a YOLOv8 detector is retrained. The loop repeats until convergence.

Package Structure

cane_robotics/
  pipeline/        Core active learning pipeline, offline replay, ROS node
  models/          Foundation model wrappers (GDINO, CLIP, DINO, dedup)
  dataset/         Dataset management and augmentation
  config/          Experiment configuration (dataclasses + YAML)
  experiments/     Experiment runners, ablations, sim2real evaluation
  training/        YOLO training and dataset preparation
  sim/             Isaac Sim synthetic data generation
  tools/           Annotation GUI, result plotting

Python API

from cane_robotics import (
    ActiveLearningPipeline,
    create_gdino_pipeline,
    ExperimentConfig,
    DatasetManager,
    TemporalDeduplicator,
)

# Create pipeline with full multi-VLM acquisition
pipeline = create_gdino_pipeline(
    known_classes=["mug", "bowl", "can"],
    acquisition_type="full",
    enable_dedup=True,
)

# Process a single image
result = pipeline.process_image("frame_001.jpg")
for obj in result["novel_objects"]:
    print(f"{obj['label']} (score={obj['score']:.3f})")

Ablation Variants

The experiment framework supports 8 acquisition function variants for systematic comparison:

Variant	Description
`full`	All three VLM signals combined (default)
`random`	Random scoring baseline
`gdino_only`	GroundingDINO confidence only
`clip_only`	CLIP novelty signal only
`dino_only`	DINO attention only
`no_fg_bg_gate`	Full formula without foreground/background gating
`no_dedup`	Full scoring with deduplication disabled
`no_sam`	Full scoring with SAM splitting disabled

Dependencies

Core: numpy, pyyaml, torch, torchvision, ultralytics, opencv-python, Pillow, transformers

Optional:

[sim] -- Isaac Sim for synthetic data generation
[dev] -- pytest, ruff for development

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cane_robotics-0.1.0.tar.gz (59.2 kB view details)

Uploaded Apr 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cane_robotics-0.1.0-py3-none-any.whl (73.6 kB view details)

Uploaded Apr 12, 2026 Python 3

File details

Details for the file cane_robotics-0.1.0.tar.gz.

File metadata

Download URL: cane_robotics-0.1.0.tar.gz
Upload date: Apr 12, 2026
Size: 59.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for cane_robotics-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`9701de8dcb3a1222208d38eb8f732fc4e1155a7e1d842ef960801a2624ba838c`
MD5	`239158e594a6d12602e3008d7bf4c778`
BLAKE2b-256	`c0a18a34591e14c9608c221d79960ce791c3dc36421e4c8c1aa03843eefeb320`

See more details on using hashes here.

File details

Details for the file cane_robotics-0.1.0-py3-none-any.whl.

File metadata

Download URL: cane_robotics-0.1.0-py3-none-any.whl
Upload date: Apr 12, 2026
Size: 73.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for cane_robotics-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3cb70e3b0158ed2a2751b617f95dd08714f752acd3e513078d43795fc1190ea7`
MD5	`be52cfad01b02c298542a7cda1a543a4`
BLAKE2b-256	`4682501f814a737b01a226684506ca7fb000e68f0f3dd7ac84ff5ece65bb3a4e`

See more details on using hashes here.

cane-robotics 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

cane-robotics

Install

Quick Start

How It Works

Package Structure

Python API

Ablation Variants

Dependencies

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes