Skip to main content

Active learning for computer vision.

Project description

Python Version PyPI Downloads License

active-vision

Explore the docs »
Quickstart · Feature Request · Report Bug · Discussions · About

The goal of this project is to create a framework for the active learning loop for computer vision. The diagram below shows a general workflow of how the active learning loop works.

active-vision

Supported tasks:

  • Image classification
  • Object detection
  • Segmentation

Supported models:

  • Fastai models
  • Torchvision models
  • Timm models
  • Hugging Face models

Supported Active Learning Strategies:

Uncertainty Sampling:

  • Least confidence
  • Margin of confidence
  • Ratio of confidence
  • Entropy

Diverse Sampling:

  • Random sampling
  • Model-based outlier
  • Embeddings-based outlier
  • Cluster-based
  • Representative

📦 Installation

Get a release from PyPI

pip install active-vision

Install from source

git clone https://github.com/dnth/active-vision.git
cd active-vision
pip install -e .

[!TIP] I recommend using uv to set up a virtual environment and install the package. You can also use other virtual env of your choice.

If you're using uv:

uv venv
uv sync

Once the virtual environment is created, you can install the package using pip.

If you're using uv add a uv before the pip install command to install into your virtual environment. Eg:

uv pip install active-vision

🚀 Quickstart

Open In Colab Open In Kaggle

from active_vision import ActiveLearner

# Create an active learner instance
al = ActiveLearner(name="cycle-1")

# Load model
al.load_model(model="resnet18", pretrained=True)

# Load dataset 
train_df = pd.read_parquet("training_samples.parquet")
al.load_dataset(train_df, filepath_col="filepath", label_col="label", batch_size=8)

# Train model
al.train(epochs=10, lr=5e-3, head_tuning_epochs=3)

# Evaluate the model on a *labeled* evaluation set
accuracy = al.evaluate(eval_df, filepath_col="filepath", label_col="label")

# Get summary of the active learning cycle
al.summary()

# Get predictions from an *unlabeled* set
pred_df = al.predict(filepaths)

# Sample images using a combination of active learning strategies
samples = al.sample_combination(
    pred_df,
    num_samples=50,
    combination={
        "least-confidence": 0.4,
        "ratio-of-confidence": 0.2,
        "entropy": 0.2,
        "model-based-outlier": 0.1,
        "random": 0.1,
    },
)

# Launch a Gradio UI to label the low confidence samples, save the labeled samples to a file
al.label(samples, output_filename="combination.parquet")

Gradio UI

In the UI, you can optionally run zero-shot inference on the image. This will use a VLM to predict the label of the image. There are a dozen VLM models as supported in the x.infer project.

Zero-Shot Inference

Once complete, the labeled samples will be save into a new df. We can now add the newly labeled data to the training set.

# Add newly labeled data to the dataset
al.add_to_dataset(labeled_df, output_filename="active_labeled.parquet")

Repeat the process until the model is good enough. Use the dataset to train a larger model and deploy.

[!TIP] For the toy dataset, I got to about 93% accuracy on the evaluation set with 200+ labeled images. The best performing model on the leaderboard got 95.11% accuracy training on all 9469 labeled images.

This took me about 6 iterations of relabeling. Each iteration took about 5 minutes to complete including labeling and model training (resnet18). See the notebook for more details.

But using the dataset of 200+ images, I trained a more capable model (convnext_small_in22k) and got 99.3% accuracy on the evaluation set. See the notebook for more details.

📊 Benchmarks

This section contains the benchmarks I ran using the active learning loop on various datasets.

Column description:

  • #Labeled Images: The number of labeled images used to train the model.
  • Evaluation Accuracy: The accuracy of the model on the evaluation set.
  • Train Epochs: The number of epochs used to train the model.
  • Model: The model used to train.
  • Active Learning: Whether active learning was used to train the model.
  • Source: The source of the results.

Imagenette

  • num classes: 10
  • num images: 9469

To start the active learning loop, I labeled 100 images (10 images from each class) and iteratively relabeled the most informative images until I hit 275 labeled images.

The active learning loop is a iterative process and can keep going until you hit a stopping point. You can decide your own stopping point based on your use case. It could be:

  • You ran out of data to label.
  • You hit a performance goal.
  • You hit a budget.
  • Other criteria.

For this dataset, I decided to stop the active learning loop at 275 labeled images because the performance on the evaluation set exceeds the top performing model on the leaderboard.

#Labeled Images Evaluation Accuracy Train Epochs Model Active Learning Source
9469 94.90% 80 xse_resnext50 Link
9469 95.11% 200 xse_resnext50 Link
275 99.33% 6 convnext_small_in22k Link
275 93.40% 4 resnet18 Link

Dog Food

  • num classes: 2
  • num images: 2100

To start the active learning loop, I labeled 20 images (10 images from each class) and iteratively relabeled the most informative images until I hit 160 labeled images.

I decided to stop the active learning loop at 160 labeled images because the performance on the evaluation set is close to the top performing model on the leaderboard. You can decide your own stopping point based on your use case.

#Labeled Images Evaluation Accuracy Train Epochs Model Active Learning Source
2100 99.70% ? vit-base-patch16-224 Link
160 100.00% 6 convnext_small_in22k Link
160 97.60% 4 resnet18 Link

Oxford-IIIT Pet

  • num classes: 37
  • num images: 3680

To start the active learning loop, I labeled 370 images (10 images from each class) and iteratively relabeled the most informative images until I hit 612 labeled images.

I decided to stop the active learning loop at 612 labeled images because the performance on the evaluation set is close to the top performing model on the leaderboard. You can decide your own stopping point based on your use case.

#Labeled Images Evaluation Accuracy Train Epochs Model Active Learning Source
3680 95.40% 5 vit-base-patch16-224 Link
612 90.26% 11 convnext_small_in22k Link
612 91.38% 11 vit-base-patch16-224 Link

Eurosat RGB

  • num classes: 10
  • num images: 16100

To start the active learning loop, I labeled 100 images (10 images from each class) and iteratively labeled the most informative images until I hit 1188 labeled images.

I decided to stop the active learning loop at 1188 labeled images because the performance on the evaluation set is close to the top performing model on the leaderboard. You can decide your own stopping point based on your use case.

#Labeled Images Evaluation Accuracy Train Epochs Model Active Learning Source
16100 98.55% 6 vit-base-patch16-224 Link
1188 94.59% 6 vit-base-patch16-224 Link
1188 96.57% 13 vit-base-patch16-224 Link

➿ Workflow

This section describes a more detailed workflow for active learning. There are two workflows for active learning that we can use depending on the availability of labeled data.

With unlabeled data

If we have no labeled data, the goal of the active learning loop is to build a resonably good labeled dataset to train a larger model.

Steps:

  1. Load a small proxy model.
  2. Label an initial dataset. If there is none, you'll have to label some images.
  3. Train the proxy model on the labeled dataset.
  4. Run inference on the unlabeled dataset.
  5. Evaluate the performance of the proxy model.
  6. Is model good enough?
    • Yes: Save the proxy model and the dataset.
    • No: Select the most informative images to label using active learning.
  7. Label the most informative images and add them to the dataset.
  8. Repeat steps 3-6.
  9. Save the proxy model and the dataset.
  10. Train a larger model on the saved dataset.
graph TD
    A[Load a small proxy model] --> B[Label an initial dataset]
    B --> C[Train proxy model on labeled dataset]
    C --> D[Run inference on unlabeled dataset]
    D --> E[Evaluate proxy model performance]
    E --> F{Model good enough?}
    F -->|Yes| G[Save proxy model and dataset]
    G --> H[Train and deploy a larger model]
    F -->|No| I[Select informative images using active learning]
    I --> J[Label selected images]
    J --> C

With labeled data

If we already have a labeled dataset, the goal of the active learning loop is to iteratively improve the dataset and the model by fixing the most important label errors.

Steps:

  1. Load a small proxy model.
  2. Train the proxy model on the labeled dataset.
  3. Run inference on the entire labeled dataset.
  4. Get the most impactful label errors with active learning.
  5. Fix the label errors.
  6. Repeat steps 2-5 until the dataset is good enough.
  7. Save the labeled dataset.
  8. Train a larger model on the saved labeled dataset.
graph TD
    A[Load a small proxy model] --> B[Train proxy model on labeled dataset]
    B --> C[Run inference on labeled dataset]
    C --> D[Get label errors using active learning]
    D --> E[Fix label errors]
    E --> F{Dataset good enough?}
    F -->|No| B
    F -->|Yes| G[Save cleaned dataset]
    G --> H[Train and deploy larger model]

🧱 Sampling Approaches

Recommendation 1:

  • 10% randomly selected from unlabeled items.
  • 80% selected from the lowest confidence items.
  • 10% selected as outliers.

Recommendation 2:

  • Sample 100 predicted images at 10–20% confidence.
  • Sample 100 predicted images at 20–30% confidence.
  • Sample 100 predicted images at 30–40% confidence, and so on.

Uncertainty and diversity sampling are most effective when combined. For instance, you could first sample the most uncertain items using an uncertainty sampling method, then apply a diversity sampling method such as clustering to select a diverse set from the uncertain items.

Ultimately, the right ratios can depend on the specific task and dataset.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

active_vision-0.4.0.tar.gz (22.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

active_vision-0.4.0-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file active_vision-0.4.0.tar.gz.

File metadata

  • Download URL: active_vision-0.4.0.tar.gz
  • Upload date:
  • Size: 22.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for active_vision-0.4.0.tar.gz
Algorithm Hash digest
SHA256 1440d1cc50ed22e669c6e51ff11789f61bb3436f015d424f96b1103220b972aa
MD5 7e286f66125323b64920c9b40f074b54
BLAKE2b-256 10c5b57d6b494a2aa51e975127e035586e57f2375ab20e2c657c5cf558c2b093

See more details on using hashes here.

File details

Details for the file active_vision-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: active_vision-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 18.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for active_vision-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d5b1377a0afcbc6b56cd5a60b9718775f60df8cbcf785b31e9ac78b6501177da
MD5 a3a2908581a1c93a770708cb1688ed7a
BLAKE2b-256 4a53aa0e043779c6acdd1d3c979bfc1e3d4329b3eb01b1f20dfb0953c01c23a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page