Marlin Learn | Framework Data Adapter
Project description
marlin_data | the data adapter
© Rahul Tandon, RS Aqua 2024
About
marlin_data allows access to RSA's acoustic signature database. The module requests the RSA acoustic dataset required in order to run a machine learning (ML). Current version of marlin_data provides a predefined dataset for proof of concept and tutorial purposes. Future versions will allow for more specific datasets as well as definig training and validation datasets separately.
NOTE | Numerical data in
marlin_datais defined using Python's numpy and pandas library.
Dependencies
import numpy as np import pandas as pd import requests, json import logging import dotenv import os, sys from dataclasses import dataclass import random
Installation
from marlin_data import *
Quick Start
Accessible Data
Datafeed Instance
Each iteration over the Marlin data feed will provide once instance of the datafeed and simulation data / snapshots. Frequency time series in numpy and a pandas dataframe is available along with descrictive metadata. Feed data is an iterable class which can be looped over allowing for an incremental datafeed.
Feed Instance -> frequency_ts_np
The frequency time series for this data snapshot ( NumPy Array )
for feed_instance in data_feed: frequency_np_array = feed_instance['frequency_ts_np']
Feed Instance -> frequency_ts_pd
The frequency time series for this data snapshot (pandas DataFrame)
for feed_instance in data_feed: frequency_data_frame = feed_instance['frequency_ts_pd']
Feed Instance -> meta_data
Metadata for the associated snapshot. Metadata is a dictionary with the following defined fields:
snapshot_id: unique id for the snapshotdata_frame_start: timestamp for the start of the snapshot data framedata_frame_end: timestamp for the end of the snapshot data framelistener_location: geo location of the listening device for snapshotlocation_name: human friendly name of listening locationframe_delta_t: delta t for the snapshot (s)sample_rate: frequency recorder sample rate
Signature Data
SignatureData dataclass is accessible from MarlinData -> signature_data. Key values are stored in MarlinData -> signature_index.
@dataclass
class SignatureData:
frequency_ts_np : np.array
frequency_ts_pd : pd.DataFrame
meta_data : {}
Metadata for the associated snapshot. Metadata is a dictionary with the following defined fields:
snapshot_id: unique id for the snapshotdata_frame_start: timestamp for the start of the snapshot data framedata_frame_end: timestamp for the end of the snapshot data framelistener_location: geo location of the listening device for snapshotlocation_name: human friendly name of listening locationframe_delta_t: delta t for the snapshot (s)sample_rate: frequency recorder sample rate
Simulation Data
SimulationData dataclass is accessible from MarlinData -> simulation_data. Key values are stored in MarlinData -> simulation_index.
@dataclass
class SimulationData:
frequency_ts_np : np.array
frequency_ts_pd : pd.DataFrame
meta_data : {}
snapshot : bool = True
Metadata for the associated snapshot. Metadata is a dictionary with the following defined fields:
snapshot_id: unique id for the snapshotdata_frame_start: timestamp for the start of the snapshot data framedata_frame_end: timestamp for the end of the snapshot data framelistener_location: geo location of the listening device for snapshotlocation_name: human friendly name of listening locationframe_delta_t: delta t for the snapshot (s)sample_rate: frequency recorder sample rate
Downloading data
Instantiate a MarlinData class.
` marlin_data = MarlinData(load_args={})`
Download signature data from RSA signature database.
`marlin_data.download_signatures(load_args={})`
Download simulation / ML run data required for datafeed.
`marlin_data.download_simulation_snapshots(load_args={})`
load_args:
limit: maximum number of downloads (Require in init())signature_ids: vector of signature idsss_ids: vector of snapshot ids◊location: vector of locations
Connecting to datafeed
Create the Marlin datafeed. This data feed can be iterated over in order to simulate a data feed into a model.
`data_feed = MalrinDataStreamer()`
Connect the downloaded data to the datafeed instance.
`data_feed.init_data(marlin_data.simulation_data, marlin_data.simulation_index) `
Iterate over the datafeed.
for data_inst in data_feed:
print (data_inst)
`
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file marlin_data-8.0.0.tar.gz.
File metadata
- Download URL: marlin_data-8.0.0.tar.gz
- Upload date:
- Size: 6.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e523e2b1afad420ca9e7fc9fbbc2b7a136c0d58bee315d9d4b026a91c2d1187
|
|
| MD5 |
ad9af3b184afc9efbdc58be6f2b86af9
|
|
| BLAKE2b-256 |
dbc77b55e6c8240373038dd4c0d8c3a4708231b287b6cf6af8e7e95cd6508096
|
File details
Details for the file marlin_data-8.0.0-py3-none-any.whl.
File metadata
- Download URL: marlin_data-8.0.0-py3-none-any.whl
- Upload date:
- Size: 32.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5af8be7294c84ca17b507bd4ad50e7afa612626af12f37931240b9a0d6c37b47
|
|
| MD5 |
0fa50eb32e62c4edb8d7b25df443b3c5
|
|
| BLAKE2b-256 |
e698f8977afce81be17cf156300d63153d560ce97666df690f48aa1db02b60f1
|