Skip to main content

Sonar conversion pipeline tool with echopype

Project description

Echodataflow: Streamlined Data Pipeline Orchestration

Welcome to Echodataflow! Echodataflow is a powerful data pipeline orchestration tool designed to simplify and enhance the execution of data processing tasks. Leveraging the capabilities of Prefect 2.0 and YAML configuration files, Echodataflow caters to the needs of scientific research and data analysis. It provides an efficient way to define, configure, and execute complex data processing workflows.

Echodataflow integrates with Echopype, a renowned package for sonar data analysis, to provide a versatile solution for researchers, analysts, and engineers. With Echodataflow, users can seamlessly process and analyze sonar data using a modular and user-friendly approach.

Getting Started with Echodataflow

This guide will walk you through the initial steps to set up and run your Echodataflow pipelines.

1. Create a Virtual Environment

To keep your Echodataflow environment isolated, it's recommended to create a virtual environment using Conda or Python's built-in venv module. Here's an example using Conda:

conda create --name echodataflow-env
conda activate echodataflow-env

Or, using Python's venv:

python -m venv echodataflow-env
source echodataflow-env/bin/activate  # On Windows, use `echodataflow-env\Scripts\activate`

2. Clone the Project

Now that you have a virtual environment set up, you can clone the Echodataflow project repository to your local machine using the following command:

git clone <repository_url>

3. Install the Package

Navigate to the project directory you've just cloned and install the Echodataflow package. The -e flag is crucial as it enables editable mode, which is especially helpful during development and testing. Now, take a moment and let the echodataflow do its thing while you enjoy your coffee.

cd <project_directory>
pip install -e .

4. Echodataflow and Prefect Initialization

To kickstart your journey with Echodataflow and Prefect, follow these simple initialization steps:

4.1 Initializing Echodataflow

Begin by initializing Echodataflow with the following command:

echodataflow init

This command sets up the groundwork for your Echodataflow environment, preparing it for seamless usage.

4.2 Initializing Prefect

For Prefect, initialization involves a few extra steps, including secure authentication. Enter the following command to initiate the Prefect authentication process:

  • If you have a Prefect Cloud account, provide your Prefect API key to securely link your account. Type your API key when prompted and press Enter.
prefect cloud login
  • If you don't have a Prefect Cloud account yet, you can use local prefect account. This is especially useful for those who are just starting out and want to explore Prefect without an account.
prefect profiles create echodataflow-local

The initialization process will ensure that both Echodataflow and Prefect are properly set up and ready for you to dive into your cloud-based workflows.

5. Configure Blocks

Echodataflow utilizes the concept of blocks which are secure containers for storing credentials and sensitive data. If you're running the entire flow locally, feel free to bypass this step.To set up your cloud credentials, configure blocks according to your cloud provider. For detailed instructions, refer to the Blocks Configuration Guide.

6. Edit the Pipeline Configuration

Open the pipeline.yaml file. This YAML configuration file defines the processes you want to execute as part of your pipeline. Customize it by adding the necessary stages and functions from echopype that you wish to run.

7. Define Data Sources and Destinations

Customize the datastore.yaml file to define the source and destination for your pipeline's data. This is where Echodataflow will fetch and store data as it executes the pipeline.

8. Execute the Pipeline

You're now ready to execute your Echodataflow pipeline! Use the echodataflow_start function, which is a central piece of Echodataflow, to kick off your pipeline. Import this function from Echodataflow and provide the paths or URLs of the configuration files. You can also pass additional options or storage options as needed. Here's an example:

Customize the paths, block name, storage type, and options based on your requirements.

from echodataflow import echodataflow_start, StorageType, load_block

dataset_config = # url or path of datastore.yaml
pipeline_config = # url or path of pipeline.yaml
logfile_config = # url or path of logging.yaml (Optional)

aws = load_block(name="<block_name>", type=<StorageType>)

options = {"storage_options_override": False} # Enabling this assigns the block for universal use, avoiding the need for repetitive configurations when employing a single credential block throughout the application.
data  = echodataflow_start(dataset_config=dataset_config, pipeline_config=pipeline_config, logging_config=logfile_config, storage_options=aws, options=options)

License

Licensed under the MIT License; you may not use this file except in compliance with the License. You may obtain a copy of the License here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

echodataflow-0.1.4.tar.gz (102.1 kB view details)

Uploaded Source

Built Distribution

echodataflow-0.1.4-py3-none-any.whl (118.0 kB view details)

Uploaded Python 3

File details

Details for the file echodataflow-0.1.4.tar.gz.

File metadata

  • Download URL: echodataflow-0.1.4.tar.gz
  • Upload date:
  • Size: 102.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for echodataflow-0.1.4.tar.gz
Algorithm Hash digest
SHA256 f946bf64afb16c0515a25affdc995163e3fef249afac6576cda0753800298f08
MD5 381dcd0e7e475882131242af348bc712
BLAKE2b-256 736da5389576ca5d04e600726955171c963c2bbb9787c064537e3ecb057aab4d

See more details on using hashes here.

File details

Details for the file echodataflow-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: echodataflow-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 118.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for echodataflow-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 3af12f6caea676f744ea956a55335a26ed3ca4c9a450bb9685fbb04c1ec1d81b
MD5 14a4d5ec2a09cbf07c9cc5e07611dc14
BLAKE2b-256 889049b2304bcb5fc3e4fa80a62e1ccf93379a4186054c869aebb981f7908634

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page