Command line tool to do extract, transform, load and download operations on AI data for a number of projects at MBARI that require detection, clustering or classification workflows.
Project description
mbari-aidata is a command line tool to do extract, transform, load and download operations on AI data for a number of projects at MBARI that require detection, clustering or classification workflows.
More documentation and examples are available at https://docs.mbari.org/internal/ai/data.
🚀 Features
- 🧠 Object Detection/Clustering Integration: Loads detection/classification/clustering output from SDCAT formatted results.
- Flexible Data Export: Downloads from Tator into machine learning formats like COCO, CIFAR, or PASCAL VOC.
- Real-Time Uploads: Pushes localizations to Tator via Redis queues for real-time workflows.
- Metadata Extraction: Parses images metadata such as GPS/time/date through a plugin-based system (extractors).
- Duplicate Detection & flexible media references: Supports duplicate media load checks with the --check-duplicates flag.
- Images or video are made accessible through a web server without needing to upload or move them from your internal NFS project mounts (e.g. Thalassa)
- Augmentation Support: Augment VOC datasets with Albumentations to boost your object detection model performance. See examples in the docs.
Requirements
- Python 3.10 or higher
- A Tator API token and (optional) Redis password for the .env file. Contact the MBARI AI team for access.
- 🐳Docker for development and testing only, but it can also be used instead of a local Python installation.
- For local installation, you will need to install the required Python packages listed in the
requirements.txtfile, ffmpeg, and the mp4dump tool from https://www.bento4.com/
📦 Installation
Install as a Python package:
pip install mbari-aidata
Create the .env file with the following contents in the root directory of the project:
TATOR_TOKEN=your_api_token
REDIS_PASSWORD=your_redis_password
ENVIRONMENT=testing or production
Create a configuration file in the root directory of the project:
touch config_cfe.yaml
Or, use the project specific configuration from our docs server at https://docs.mbari.org/internal/ai/projects/
This file will be used to configure the project data, such as mounts, plugins, and database connections.
aidata download --version Baseline --labels "Diatoms, Copepods" --config https://docs.mbari.org/internal/ai/projects/uav-901902/config_uav.yml
⚙️Example configuration file:
# config_cfe.yml
# Config file for CFE project production
mounts:
- name: "image"
path: "/mnt/CFElab"
host: "https://mantis.shore.mbari.org"
nginx_root: "/CFElab"
- name: "video"
path: "/mnt/CFElab"
host: "https://mantis.shore.mbari.org"
nginx_root: "/CFElab"
plugins:
- name: "extractor"
module: "mbari_aidata.plugins.extractors.tap_cfe_media"
function: "extract_media"
redis:
host: "doris.shore.mbari.org"
port: 6382
vss:
project: "902111-CFE"
model: "google/vit-base-patch16-224"
tator:
project: "902111-CFE"
host: "https://mantis.shore.mbari.org"
image:
attributes:
iso_datetime: #<-------Required for images
type: datetime
depth:
type: float
video:
attributes:
iso_start_datetime: #<-------Required for videos
type: datetime
box:
attributes:
Label:
type: string
score:
type: float
cluster:
type: string
saliency:
type: float
area:
type: int
exemplar:
type: bool
🐳 Docker usage
A docker version is also available at mbari/aidata:latest or mbari/aidata:latest:cuda-124.
For example, to download data using the docker image:
docker run -it --rm -v $(pwd):/mnt mbari/aidata:latest aidata download --version Baseline --labels "Diatoms, Copepods" --config config_cfe.yml
Commands
aidata download --help- Download data, such as images, boxes, into various formats for machine learning e.g. COCO, CIFAR, or PASCAL VOC format. Augmentation supported for VOC exported data using Albumentations.aidata load --help- Load data, such as images, boxes, or clusters into either a Postgres or REDIS databaseaidata db --help- Commands related to database managementaidata transform --help- Commands related to transforming downloaded dataaidata -h- Print help message and exit.
Source code is available at github.com/mbari-org/aidata.
Development
See the Development Guide for more information on how to set up the development environment or the justfile
🗓️ Last updated: 2025-06-13
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mbari_aidata-1.55.3.tar.gz.
File metadata
- Download URL: mbari_aidata-1.55.3.tar.gz
- Upload date:
- Size: 46.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.11.13 Linux/6.11.0-1015-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eaecfe3945337a0c1aa73c2dcae9e6d6df57f265b36b7843aafc4795983c63aa
|
|
| MD5 |
37d26755843658e4102cf8859982fd62
|
|
| BLAKE2b-256 |
c0a63c1523b5ba38f346fe422ab92165ba925943068522a0bee41783fcbf1e4d
|
File details
Details for the file mbari_aidata-1.55.3-py3-none-any.whl.
File metadata
- Download URL: mbari_aidata-1.55.3-py3-none-any.whl
- Upload date:
- Size: 64.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.11.13 Linux/6.11.0-1015-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e6edf96578ffd183385b82e6398da7cf7f126dde05c148bbe5fa24d30605730
|
|
| MD5 |
9e3378a02e6a3e1fcff787b98436689e
|
|
| BLAKE2b-256 |
d8c69a06ef5ad355f5e0a27cf42e455d81d785098470dde9963025d2b5e422f5
|