Skip to main content

Marlin Learn | Framework Data Adapter

Project description

marlin_data | the data adapter

© Rahul Tandon, RS Aqua 2024

About

marlin_data allows access to RSA's acoustic signature database. The module requests the RSA acoustic dataset required in order to run a machine learning (ML). Current version of marlin_data provides a predefined dataset for proof of concept and tutorial purposes. Future versions will allow for more specific datasets as well as definig training and validation datasets separately.

NOTE | Numerical data in marlin_data is defined using Python's numpy and pandas library.

Dependencies

import numpy as np import pandas as pd import requests, json import logging import dotenv import os, sys from dataclasses import dataclass import random

Installation

from marlin_data import *

Quick Start

Accessible Data

Datafeed Instance

Each iteration over the Marlin data feed will provide once instance of the datafeed and simulation data / snapshots. Frequency time series in numpy and a pandas dataframe is available along with descrictive metadata. Feed data is an iterable class which can be looped over allowing for an incremental datafeed.

Feed Instance -> frequency_ts_np

The frequency time series for this data snapshot ( NumPy Array )

    for feed_instance in data_feed: frequency_np_array = feed_instance['frequency_ts_np']

Feed Instance -> frequency_ts_pd

The frequency time series for this data snapshot (pandas DataFrame)

    for feed_instance in data_feed: frequency_data_frame = feed_instance['frequency_ts_pd']

Feed Instance -> meta_data

Metadata for the associated snapshot. Metadata is a dictionary with the following defined fields:

  • snapshot_id : unique id for the snapshot
  • data_frame_start : timestamp for the start of the snapshot data frame
  • data_frame_end : timestamp for the end of the snapshot data frame
  • listener_location : geo location of the listening device for snapshot
  • location_name : human friendly name of listening location
  • frame_delta_t : delta t for the snapshot (s)
  • sample_rate : frequency recorder sample rate

Signature Data

SignatureData dataclass is accessible from MarlinData -> signature_data. Key values are stored in MarlinData -> signature_index.

    @dataclass
    class SignatureData:
            frequency_ts_np : np.array
            frequency_ts_pd : pd.DataFrame
            meta_data : {}

Metadata for the associated snapshot. Metadata is a dictionary with the following defined fields:

  • snapshot_id : unique id for the snapshot
  • data_frame_start : timestamp for the start of the snapshot data frame
  • data_frame_end : timestamp for the end of the snapshot data frame
  • listener_location : geo location of the listening device for snapshot
  • location_name : human friendly name of listening location
  • frame_delta_t : delta t for the snapshot (s)
  • sample_rate : frequency recorder sample rate

Simulation Data

SimulationData dataclass is accessible from MarlinData -> simulation_data. Key values are stored in MarlinData -> simulation_index.

    @dataclass
    class SimulationData:
        frequency_ts_np : np.array
        frequency_ts_pd : pd.DataFrame
        meta_data : {}
        snapshot : bool = True

Metadata for the associated snapshot. Metadata is a dictionary with the following defined fields:

  • snapshot_id : unique id for the snapshot
  • data_frame_start : timestamp for the start of the snapshot data frame
  • data_frame_end : timestamp for the end of the snapshot data frame
  • listener_location : geo location of the listening device for snapshot
  • location_name : human friendly name of listening location
  • frame_delta_t : delta t for the snapshot (s)
  • sample_rate : frequency recorder sample rate

Downloading data

Instantiate a MarlinData class.

    ` marlin_data = MarlinData(load_args={})`

Download signature data from RSA signature database.

    `marlin_data.download_signatures(load_args={})`

Download simulation / ML run data required for datafeed.

    `marlin_data.download_simulation_snapshots(load_args={})`

load_args:

  • limit : maximum number of downloads (Require in init())
  • signature_ids : vector of signature ids
  • ss_ids : vector of snapshot ids◊
  • location : vector of locations

Connecting to datafeed

Create the Marlin datafeed. This data feed can be iterated over in order to simulate a data feed into a model.

    `data_feed = MalrinDataStreamer()`

Connect the downloaded data to the datafeed instance.

    `data_feed.init_data(marlin_data.simulation_data, marlin_data.simulation_index) `

Iterate over the datafeed.

    for data_inst in data_feed:
            print (data_inst)

`

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

marlin_data-15.0.0.tar.gz (6.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

marlin_data-15.0.0-py3-none-any.whl (40.6 kB view details)

Uploaded Python 3

File details

Details for the file marlin_data-15.0.0.tar.gz.

File metadata

  • Download URL: marlin_data-15.0.0.tar.gz
  • Upload date:
  • Size: 6.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for marlin_data-15.0.0.tar.gz
Algorithm Hash digest
SHA256 67ba05e25b52106ac9bb3dfae6dd473e519e74de76640331b1cdc3177c938716
MD5 1994611518d7285652d6f9c783ae0301
BLAKE2b-256 14bc59b874d7ff65a3e066a7b87d2925ad24246c60388f1494d838a085862f62

See more details on using hashes here.

File details

Details for the file marlin_data-15.0.0-py3-none-any.whl.

File metadata

  • Download URL: marlin_data-15.0.0-py3-none-any.whl
  • Upload date:
  • Size: 40.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for marlin_data-15.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9be14d6810ea8f2366179cb150215e16970a23aa104e4964196834ff1a58080b
MD5 23111089954c81a739557ce8f0086197
BLAKE2b-256 d70351a3adb68ef82864e184d5a4ff7f28ec9231eea1f062dd2e80cd5f89331b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page