Skip to main content

Command line tool to do extract, transform, load and download operations on AI data for a number of projects at MBARI that require detection, clustering or classification workflows.

Project description

MBARI semantic-release License Python

mbari-aidata is a command line tool to do extract, transform, load and download operations on AI data for a number of projects at MBARI that require detection, clustering or classification workflows.

More documentation and examples are available at https://docs.mbari.org/internal/ai/data.

🚀 Features

  • 🧠 Object Detection/Clustering Integration: Loads detection/classification/clustering output from SDCAT formatted results.
  • Flexible Data Export: Downloads from Tator into machine learning formats like COCO, CIFAR, or PASCAL VOC.
  • Real-Time Uploads: Pushes localizations to Tator via Redis queues for real-time workflows.
  • Metadata Extraction: Parses images metadata such as GPS/time/date through a plugin-based system (extractors). *️Duplicate Detection & flexible media references: Supports duplicate media load checks with the --check-duplicates flag. References images or video accessible through a web server without needing to upload them.
  • Augmentation Support: Augment VOC datasets with Albumentations to boost your object detection model performance. See examples in the docs.

Requirements

  • Python 3.10 or higher
  • A Tator API token and (optional) Redis password for the .env file. Contact the MBARI AI team for access.
  • 🐳Docker for development and testing only, but it can also be used instead of a local Python installation.

📦 Installation

Install as a Python package:

pip install mbari-aidata

Create the .env file with the following contents in the root directory of the project:

TATOR_TOKEN=your_api_token
REDIS_PASSWORD=your_redis_password
ENVIRONMENT=testing or production

Create a configuration file in the root directory of the project:

touch config_cfe.yaml

Or, use the project specific configuration from our docs server at https://docs.mbari.org/internal/ai/projects/

This file will be used to configure the project data, such as mounts, plugins, and database connections.

aidata download --version Baseline --labels "Diatoms, Copepods" --config https://docs.mbari.org/internal/ai/projects/uav-901902/config_uav.yml

⚙️Example configuration file:

# config_cfe.yml
# Config file for CFE project production
mounts:
  - name: "image"
    path: "/mnt/CFElab"
    host: "https://mantis.shore.mbari.org"
    nginx_root: "/CFElab"

  - name: "video"
    path: "/mnt/CFElab"
    host: "https://mantis.shore.mbari.org"
    nginx_root: "/CFElab"


plugins:
  - name: "extractor"
    module: "mbari_aidata.plugins.extractors.tap_cfe_media"
    function: "extract_media"

redis:
  host: "doris.shore.mbari.org"
  port: 6382

vss:
  project: "902111-CFE"
  model: "google/vit-base-patch16-224"

tator:
  project: "902111-CFE"
  host: "https://mantis.shore.mbari.org"
  image:
    attributes:
      iso_datetime:
        type: datetime
      depth:
        type: float
  video:
    attributes:
      iso_start_datetime:
        type: datetime
  box:
    attributes:
      Label:
        type: string
      score:
        type: float
      cluster:
        type: string
      saliency:
        type: float
      area:
        type: int
      exemplar:
        type: bool

🐳 Docker usage

A docker version is also available at mbari/aidata:latest or mbari/aidata:latest:cuda-124. For example, to download data using the docker image:

docker run -it --rm -v $(pwd):/mnt mbari/aidata:latest aidata download --version Baseline --labels "Diatoms, Copepods" --config config_cfe.yml

Commands

  • aidata download --help - Download data, such as images, boxes, into various formats for machine learning e.g. COCO, CIFAR, or PASCAL VOC format. Augmentation supported for VOC exported data using Albumentations.
  • aidata load --help - Load data, such as images, boxes, or clusters into either a Postgres or REDIS database
  • aidata db --help - Commands related to database management
  • aidata transform --help - Commands related to transforming downloaded data
  • aidata -h - Print help message and exit.

Source code is available at github.com/mbari-org/aidata.

Development

See the Development Guide for more information on how to set up the development environment or the justfile

🗓️ Last updated: 2025-06-01

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mbari_aidata-1.55.1.tar.gz (45.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mbari_aidata-1.55.1-py3-none-any.whl (63.2 kB view details)

Uploaded Python 3

File details

Details for the file mbari_aidata-1.55.1.tar.gz.

File metadata

  • Download URL: mbari_aidata-1.55.1.tar.gz
  • Upload date:
  • Size: 45.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.11.12 Linux/6.11.0-1014-azure

File hashes

Hashes for mbari_aidata-1.55.1.tar.gz
Algorithm Hash digest
SHA256 df89f32e27cb71199b0080061c836fb51b6f98bf880a6bbc85cd46f44ad90e42
MD5 50d475fdfa9e28de50d93b64e3af4298
BLAKE2b-256 8edddb2a63b900b8b985514d8b633f9a9017a9a9da8a318b8615501ba42b5893

See more details on using hashes here.

File details

Details for the file mbari_aidata-1.55.1-py3-none-any.whl.

File metadata

  • Download URL: mbari_aidata-1.55.1-py3-none-any.whl
  • Upload date:
  • Size: 63.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.11.12 Linux/6.11.0-1014-azure

File hashes

Hashes for mbari_aidata-1.55.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2025f1d2df93999e26eb48e7e38dd263716b6cd781a20e4b47bfb9075ada315b
MD5 04e5943d49e3a8f86ce21f8ee685d430
BLAKE2b-256 5e91ad850652d59a8d84d4d041ed76bf650d12eba6bd78967120e7ead2e60be6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page