Skip to main content

Command line tool to do extract, transform, load and download operations on AI data for a number of projects at MBARI that require detection, clustering or classification workflows.

Project description

MBARI semantic-release License Python

mbari-aidata is a command line tool to do extract, transform, load and download operations on AI data for a number of projects at MBARI that require detection, clustering or classification workflows.

More documentation and examples are available at https://docs.mbari.org/internal/ai/data.

Features:

  • Loading object detection/classification/clustering output from SDCAT formatted output
  • Downloads from Tator into various formats for machine learning, e.g. COCO, CIFAR, or PASCAL VOC format.
  • Uploads triggered from a Redis queue for workflows that need real-time loads.
  • Loading metadata from SONY cameras, extracting timestamps from images and video, and loading VOC formatted data. The plugin architecture allows for easy extension to other data sources and formats. Media loads are generally handled in a project specific way by the plugin/extractors module.
  • Media must exist through a URL accessible by the Tator server. The media may be checked for duplicates and uploaded if necessary.
  • Augmentations are available for VOC downloaded data to create more training data using the albumentations library

Requirements

  • Python 3.10 or higher
  • A Tator API token and Redis password for the .env file. Contact the MBARI AI team for access.
  • Docker for development and testing only, but it can also be used instead of a local Python installation.

Installation

Install as a Python package:

pip install mbari-aidata

Create the .env file with the following contents in the root directory of the project:

TATOR_TOKEN=your_api_token
REDIS_PASSWORD=your_redis_password
ENVIRONMENT=testing or production

Create a configuration file in the root directory of the project:

touch config_cfe.yaml

Or, use the project specific configuration from our docs server at https://docs.mbari.org/internal/ai/projects/

This file will be used to configure the project data, such as mounts, plugins, and database connections.

aidata download --version Baseline --labels "Diatoms, Copepods" --config https://docs.mbari.org/internal/ai/projects/uav-901902/config_uav.yml

Example configuration file:

# config_cfe.yml
# Config file for CFE project production
mounts:
  - name: "image"
    path: "/mnt/CFElab"
    host: "mantis.shore.mbari.org"
    nginx_root: "/CFElab"

  - name: "video"
    path: "/mnt/CFElab"
    host: "mantis.shore.mbari.org"
    nginx_root: "/CFElab"


plugins:
  - name: "extractor"
    module: "mbari_aidata.plugins.extractors.tap_cfe_media"
    function: "extract_media"

redis:
  host: "doris.shore.mbari.org"
  port: 6382

vss:
  project: "902111-CFE"
  model: "google/vit-base-patch16-224"

tator:
  project: "902111-CFE"
  host: "https://mantis.shore.mbari.org"
  image:
    attributes:
      iso_datetime:
        type: datetime
      depth:
        type: float
  video:
    attributes:
      iso_start_datetime:
        type: datetime
  box:
    attributes:
      Label:
        type: string
      score:
        type: float
      cluster:
        type: string
      saliency:
        type: float
      area:
        type: int
      exemplar:
        type: bool

A docker version is also available at mbari/aidata:latest or mbari/aidata:latest:cuda-124. For example, to download data using the docker image:

docker run -it --rm -v $(pwd):/mnt mbari/aidata:latest aidata download --version Baseline --labels "Diatoms, Copepods" --config config_cfe.yml

Commands

  • aidata download --help - Download data, such as images, boxes, into various formats for machine learning e,g, COCO, CIFAR, or PASCAL VOC format
  • aidata load --help - Load data, such as images, and boxes into either a Postgres or REDIS database
  • aidata db --help - Commands related to database management
  • aidata transform --help - Commands related to transforming downloaded data
  • aidata -h - Print help message and exit.

Source code is available at github.com/mbari-org/aidata.

Development

See the Development Guide for more information on how to set up the development environment.

updated: 2025-04-07

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mbari_aidata-1.50.0.tar.gz (43.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mbari_aidata-1.50.0-py3-none-any.whl (60.2 kB view details)

Uploaded Python 3

File details

Details for the file mbari_aidata-1.50.0.tar.gz.

File metadata

  • Download URL: mbari_aidata-1.50.0.tar.gz
  • Upload date:
  • Size: 43.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.11.12 Linux/6.8.0-1021-azure

File hashes

Hashes for mbari_aidata-1.50.0.tar.gz
Algorithm Hash digest
SHA256 54bd4ad595f8bd3541f987dce9ee79af891e6f138cadf35e882beb81d9b71026
MD5 4de1fea6abb2b1f2b819d9a37fb4e205
BLAKE2b-256 8aeca2a1e37d33b284bceaa75410b9d77ec0332ea869ceda82b20828a77b4bdd

See more details on using hashes here.

File details

Details for the file mbari_aidata-1.50.0-py3-none-any.whl.

File metadata

  • Download URL: mbari_aidata-1.50.0-py3-none-any.whl
  • Upload date:
  • Size: 60.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.11.12 Linux/6.8.0-1021-azure

File hashes

Hashes for mbari_aidata-1.50.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8f55c7ae3a6f658b55afa2c56574b693f9367ac13ac0e35e544becd25d0d0861
MD5 50ffe2fe3100b2fde484e6d2d9bd5210
BLAKE2b-256 670de04ab92de674f0982f789f6fce4c0e2ac4e6492992f81d94ab9c42914123

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page