Skip to main content

Marlin Learn | Framework Data Adapter

Project description

marlin_data | the data adapter

© Rahul Tandon, RS Aqua 2024

About

marlin_data allows access to RSA's acoustic signature database. The module requests the RSA acoustic dataset required in order to run a machine learning (ML). Current version of marlin_data provides a predefined dataset for proof of concept and tutorial purposes. Future versions will allow for more specific datasets as well as definig training and validation datasets separately.

NOTE | Numerical data in marlin_data is defined using Python's numpy and pandas library.

Dependencies

import numpy as np import pandas as pd import requests, json import logging import dotenv import os, sys from dataclasses import dataclass import random

Installation

from marlin_data import *

Quick Start

Accessible Data

Datafeed Instance

Each iteration over the Marlin data feed will provide once instance of the datafeed and simulation data / snapshots. Frequency time series in numpy and a pandas dataframe is available along with descrictive metadata. Feed data is an iterable class which can be looped over allowing for an incremental datafeed.

Feed Instance -> frequency_ts_np

The frequency time series for this data snapshot ( NumPy Array )

    for feed_instance in data_feed: frequency_np_array = feed_instance['frequency_ts_np']

Feed Instance -> frequency_ts_pd

The frequency time series for this data snapshot (pandas DataFrame)

    for feed_instance in data_feed: frequency_data_frame = feed_instance['frequency_ts_pd']

Feed Instance -> meta_data

Metadata for the associated snapshot. Metadata is a dictionary with the following defined fields:

  • snapshot_id : unique id for the snapshot
  • data_frame_start : timestamp for the start of the snapshot data frame
  • data_frame_end : timestamp for the end of the snapshot data frame
  • listener_location : geo location of the listening device for snapshot
  • location_name : human friendly name of listening location
  • frame_delta_t : delta t for the snapshot (s)
  • sample_rate : frequency recorder sample rate

Signature Data

SignatureData dataclass is accessible from MarlinData -> signature_data. Key values are stored in MarlinData -> signature_index.

    @dataclass
    class SignatureData:
            frequency_ts_np : np.array
            frequency_ts_pd : pd.DataFrame
            meta_data : {}

Metadata for the associated snapshot. Metadata is a dictionary with the following defined fields:

  • snapshot_id : unique id for the snapshot
  • data_frame_start : timestamp for the start of the snapshot data frame
  • data_frame_end : timestamp for the end of the snapshot data frame
  • listener_location : geo location of the listening device for snapshot
  • location_name : human friendly name of listening location
  • frame_delta_t : delta t for the snapshot (s)
  • sample_rate : frequency recorder sample rate

Simulation Data

SimulationData dataclass is accessible from MarlinData -> simulation_data. Key values are stored in MarlinData -> simulation_index.

    @dataclass
    class SimulationData:
        frequency_ts_np : np.array
        frequency_ts_pd : pd.DataFrame
        meta_data : {}
        snapshot : bool = True

Metadata for the associated snapshot. Metadata is a dictionary with the following defined fields:

  • snapshot_id : unique id for the snapshot
  • data_frame_start : timestamp for the start of the snapshot data frame
  • data_frame_end : timestamp for the end of the snapshot data frame
  • listener_location : geo location of the listening device for snapshot
  • location_name : human friendly name of listening location
  • frame_delta_t : delta t for the snapshot (s)
  • sample_rate : frequency recorder sample rate

Downloading data

Instantiate a MarlinData class.

    ` marlin_data = MarlinData(load_args={})`

Download signature data from RSA signature database.

    `marlin_data.download_signatures(load_args={})`

Download simulation / ML run data required for datafeed.

    `marlin_data.download_simulation_snapshots(load_args={})`

load_args:

  • limit : maximum number of downloads (Require in init())
  • signature_ids : vector of signature ids
  • ss_ids : vector of snapshot ids◊
  • location : vector of locations

Connecting to datafeed

Create the Marlin datafeed. This data feed can be iterated over in order to simulate a data feed into a model.

    `data_feed = MalrinDataStreamer()`

Connect the downloaded data to the datafeed instance.

    `data_feed.init_data(marlin_data.simulation_data, marlin_data.simulation_index) `

Iterate over the datafeed.

    for data_inst in data_feed:
            print (data_inst)

`

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

marlin_data-10.0.0.tar.gz (6.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

marlin_data-10.0.0-py3-none-any.whl (38.9 kB view details)

Uploaded Python 3

File details

Details for the file marlin_data-10.0.0.tar.gz.

File metadata

  • Download URL: marlin_data-10.0.0.tar.gz
  • Upload date:
  • Size: 6.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for marlin_data-10.0.0.tar.gz
Algorithm Hash digest
SHA256 cadd7e8bc01662b26d09416b07ee0072361202a06601b4aae81ed1b0d7f4c583
MD5 5b5ec43a452681b254db3cf5c8097cac
BLAKE2b-256 a386e741ea5933369edc5d13df5133ac9bef2ef328c58bfc501c276c73e5a896

See more details on using hashes here.

File details

Details for the file marlin_data-10.0.0-py3-none-any.whl.

File metadata

  • Download URL: marlin_data-10.0.0-py3-none-any.whl
  • Upload date:
  • Size: 38.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for marlin_data-10.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3d0c585307d37a524a48604b5288e87d897433d91321d0a656af453d71e6bc04
MD5 adc2f195d112e9cd2b7410a45206a116
BLAKE2b-256 8a8e7a1d283f45e737fa919ab9335f136bc01bbaac902006ff8987830ea49b98

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page