Skip to main content

Alyeska /al-ee-EHS-kah/ n. A Data Pipeline Toolkit

Project description

Supported Versions License Documentation Status Version PyPI - Downloads Last Commit Count Open Issues

Alyeska is a data engineering toolkit to simplify the nuts & bolts of data engineering tasks.

More concretely, alyeska bridges the gap between common Python modules and common data engineering technologies. i.e. pandas, psycopg2, AWS Redshift, AWS secretsmanager, and more. Alyeska offers simple functions and/or syntactic sugar to common tasks:

  • Safely executing many SQL statements against a database (sqlagent)

  • Loading a pandas dataframe into a database (redpandas)

  • Assuming an AWS IAM user with multi-factor authorization (locksmith.authmfa)

  • Creating psycopg2 connections to Redshift (locksmith)

  • Generate shell scripts that respect workflow dependencies (compose.compose_sh)

While Alyeska mimics some functionalities, it is not a replacement for Airflow, AWS Glue, or other purpose-built data engineering technologies. That is, metaphorically, Alyeska is your parents’ toolbox. While terrific for fixing a leaky faucet, but it is no replacement for a plumber.

Sample Usage

Assume an AWS IAM user with multi-factor authorization

Alyeska’s authmfa command line utility is useful for quickly assuming an AWS IAM user with MFA.

$ authmfa MyAwsUser
export AWS_ACCESS_KEY_ID=ABCDEFGHIJKLMNOPQRSTUVWXYZ
export AWS_SECRET_ACCESS_KEY=abcdefg1234567!@#$%^&
export AWS_SESSION_TOKEN=notarealsessiontoken///////5AVHwuGc*hYLp%$vr51*XTEHJjRD2JxavaD8wlJqi!aCZVhvp7nzt!U5elvoPZ@GlG%a9sT^HBrgKzQ8xZrpAADp65RYQzqvawF
$ eval `authmfa MyAwsUser`  # export to environment

Learn more about how to config this utility with authmfa -h.

Load a pandas dataframe into AWS Redshift

Large tables can be frustrating to load into Redshift. Alyeska reduces the process to a short one-line statement:

>>> aly.redpandas.insert_pandas_into(cnxn, target_table, payload_df)

In practice, it may function as

>>> import alyeska as aly
>>> import alyeska.locksmith.redshift as rs
>>> import pandas as pd

>>> cnxn = rs.connect_with_environment("my-user")
>>> target_table = "db.natural_numbers"

>>> sql = f"CREATE TABLE {target_table}(n INT NOT NULL)"
>>> aly.sqlagent.execute_sql(cnxn, sql)  # create table

>>> natural_numbers_df = pd.DataFrame({"n": range(1, 1_000_001)})
>>> aly.redpandas.insert_pandas_into(cnxn, target_table, natural_numbers_df)

Components

Tools are broken out into modules with niche purposes:

  1. compose is a workflow dependency management tool

  2. locksmith helps authorize AWS sessions and Redshift connections

  3. logging is another thin module that standardizes logging practices

  4. redpandas supports less verbose pandas/redshift functionality

  5. sqlagent supports SQL executation and runtime configuration

License

This project is licensed under the Apache v2.0 License - see the LICENSE file for details.

Contribute

Begin by reading our Code of Conduct.

There are some devtools required to contribute to the repo. Create a development environment and install pre-commit to run hooks on your code.

$ python3 -m venv .venv
$ source .venv/bin/activate
$ pip install -r requirements.txt
$ pip install -r requirements.dev.txt
$ pre-commit install
$ pre-commit autoupdate

Namesake

The Alyeska Pipeline Service company maintains the Alaska pipeline; a 1200 km long pipeline connecting the oil-rich, subterranean earth in Alaska to port on the north pacific ocean.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alyeska-0.3.0a1.tar.gz (22.0 kB view details)

Uploaded Source

Built Distribution

alyeska-0.3.0a1-py3-none-any.whl (32.8 kB view details)

Uploaded Python 3

File details

Details for the file alyeska-0.3.0a1.tar.gz.

File metadata

  • Download URL: alyeska-0.3.0a1.tar.gz
  • Upload date:
  • Size: 22.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for alyeska-0.3.0a1.tar.gz
Algorithm Hash digest
SHA256 524f5e24ee88071247c1349eecd37d7657c4e043db2879d6f8649192fb515a00
MD5 9262189af6607085914c3ab582aa6cc0
BLAKE2b-256 de98df67accdc8318702fd837b158fb05aed28d61296ca292dfcfb4497661b22

See more details on using hashes here.

File details

Details for the file alyeska-0.3.0a1-py3-none-any.whl.

File metadata

  • Download URL: alyeska-0.3.0a1-py3-none-any.whl
  • Upload date:
  • Size: 32.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for alyeska-0.3.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 e5d6daec43d5b3ec3fd1bc393fccc085a6203a4a5fce466038aa14a42c9528ab
MD5 bfd4249e744bcc71b387537c62ab259a
BLAKE2b-256 e13697ef015678ae558f8b5de8d8f37d03b6563609c84879f7ab81e279668369

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page