Command-line tool for extracting DINO features for images and videos
Project description
🦕 DINOtool
DINOtool is a simple Python package that makes it easy to extract and visualize features from images and videos using DINOv2 models. DINOtool helps you generate frame and patch-level embeddings with a single command.
✨ Features
- 📷 Extract DINO features from:
- Single images
- Video files (
.mp4,.avi, etc.) - Folders containing image sequences
- 🌈 Automatically generates PCA visualizations of the features
- 🧠 Visuals include side-by-side view of the original frame and the feature map
- Saves features for downstream tasks
- ⚡ Command-line interface for easy, no-code operation
📦 Installation
Basic install (Linux/WSL2)
Install via pip:
pip install dinotool
You'll also need to have ffmpeg installed:
sudo apt install ffmpeg
You can check that dinotool is properly installed by testing it on an image:
dinotool test.jpg -o out.jpg
🐍 Conda Environment (Recommended)
If you want an isolated setup, especially useful for managing ffmpeg and dependencies:
Install Miniforge.
conda create -n dinotool python=3.12
conda activate dinotool
conda install -c conda-forge ffmpeg
pip install dinotool
Windows notes:
- Windows is supported only for CPU usage. If you want GPU support on Windows, we recommend using WSL2 + Ubuntu.
- The conda method above is recommended for Windows CPU setups.
🚀 Quickstart
📸 Image:
dinotool input.jpg -o output.jpg
🎞️ Video
dinotool input.mp4 -o output.mp4
📁 Folder of Images (treated as video frames)
dinotool path/to/folder/ -o output.mp4
The output is a side-by-side visualization with PCA of the patch-level features.
🧪 Advanced Options
| Flag | Description |
|---|---|
--model-name |
Use a different DINO model (default: dinov2_vits14_reg) |
--input-size W H |
Resize input before inference |
--batch-size |
Batch size for processing (default: 1) |
--only-pca |
Output only the PCA map, without side-by-side |
--save-features |
Save extracted features: full, flat, or frame |
-o, --output |
Output path (required) |
Tips:
Increase --batch-size to the largest value your memory supports for faster processing.
dinotool input.mp4 -o output.mp4 --batch-size 16
For large videos, reduce the input size with --input-size
# Processing a HD video faster:
dinotool input.mp4 -o output.mp4 --input-size 920 540 --batch-size 16
💾 Feature extraction options
Use --save-features to export DINO features for downstream tasks.
| Mode | Format | Output shape | Best for |
|---|---|---|---|
full |
.nc (image) / .zarr (video) |
(frames, height, width, feature) |
Keeps spatial structure of patches. |
flat |
partitioned .parquet |
(frames * height * weight, feature) |
Reliable long video processing. Faster patch-level analysis |
frame |
.parquet |
(frames, feature) |
One feature vector per frame (global content representation) |
full - Spatial patch features
- Saves full patch feature maps from the ViT (one vector per image patch).
- Useful for reconstructing spatial attention maps or for downstream tasks like segmentation.
- Stored as netCDF for single images,
.zarrfor video sequences. zarrsaving can be memory-intensive and might still fail for large videos.
dinotool input.mp4 -o output.mp4 --save-features full
flat - Flattened patch features
- Saves same vectors as above, but discards 2D spatial layout and saves output in
parquetformat. - More reliable for longer videos.
- Useful for faster computations for statistics, patch-level similarity and clustering.
dinotool input.mp4 -o output.mp4 --save-features flat
frame - Frame-level features
- Saves one vector per frame using the
[CLS]token from DINO. - Useful for temporal tasks, video summarization and classification.
- For image input saves a
.txtfile with a single vector - For video input saves a
.parquetfile with one row per frame.
# For a video
dinotool input.mp4 -o output.mp4 --save-features frame
# For an image
dinotool input.jpg -o output.jpg --save-features frame
🧑💻 Usage reference
🦕 DINOtool: Extract and visualize DINO features from images and videos.
Usage:
dinotool input_path -o output_path [options]
Arguments:
input Path to image, video file, or folder of frames.
-o, --output Path for the output (required).
Options:
--model-name MODEL DINO model to use (default: dinov2_vits14_reg)
--input-size W H Resize input before processing
--batch-size N Batch size for inference
--only-pca Only visualize PCA features
--save-features MODE Save extracted features: full, flat, or frame
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dinotool-0.1.0.tar.gz.
File metadata
- Download URL: dinotool-0.1.0.tar.gz
- Upload date:
- Size: 9.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b837f86c7ecbf131d10c28146f4d5d95cc8982ee38e9cd7c9fd1917015c5dd41
|
|
| MD5 |
84438c5f793f07d4fd7672012fad302b
|
|
| BLAKE2b-256 |
2664c954c8ad1cab7167b446d7cccdb3df67f7429ad06dd0e1c79d7be878e21f
|
File details
Details for the file dinotool-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dinotool-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2a5f4384908151f5d0ebe4fc13e08a0dc628ef6b4333d51d8d3d00c98e25dab
|
|
| MD5 |
2ee19cc478f89ecce1849e42d8a575e2
|
|
| BLAKE2b-256 |
a515f7b1381004682b0f949e0994fbf901f3aa002126367acbd0e1760275cb4f
|