Skip to main content

Common utility functions for data engineering usecases

Project description

hip-data-tools

© Hipages Group Pty Ltd 2019

PyPI version CircleCI

Common Python tools and utilities for data engineering, ETL, Exploration, etc. The package is uploaded to PyPi for easy drop and use in various environmnets, such as (but not limited to):

  1. Running production workloads
  2. ML Training in Jupyter like notebooks
  3. Local machine for dev and exploration

Installation

Install from PyPi repo:

pip3 install hip-data-tools

Install from source

pip3 install .

Connect to aws

You will need to instantiate an AWS Connection:

from hip_data_tools.aws.common import AwsConnectionManager, AwsConnectionSettings, AwsSecretsManager

# to connect using an aws cli profile
conn = AwsConnectionManager(AwsConnectionSettings(region="ap-southeast-2", secrets_manager=None, profile="default"))

# OR if you want to connect using the standard aws environment variables
conn = AwsConnectionManager(settings=AwsConnectionSettings(region="ap-southeast-2", secrets_manager=AwsSecretsManager(), profile=None))

# OR if you want custom set of env vars to connect
conn = AwsConnectionManager(
    settings=AwsConnectionSettings(
        region="ap-southeast-2",
        secrets_manager=AwsSecretsManager(
            access_key_id_var="SOME_CUSTOM_AWS_ACCESS_KEY_ID",
            secret_access_key_var="SOME_CUSTOM_AWS_SECRET_ACCESS_KEY",
            use_session_token=True,
            aws_session_token_var="SOME_CUSTOM_AWS_SESSION_TOKEN"
            ),
        profile=None,
        )
    )

Using this connection to object you can use the aws utilities, for example aws Athena:

from hip_data_tools.aws.athena import AthenaUtil

au = AthenaUtil(database="default", conn=conn, output_bucket="example", output_key="tmp/scratch/")
result = au.run_query("SELECT * FROM temp limit 10", return_result=True)
print(result)

Connect to Cassandra

from cassandra.policies import DCAwareRoundRobinPolicy
from cassandra.cqlengine import columns
from cassandra.cqlengine.management import sync_table
from cassandra.cqlengine.models import Model

load_balancing_policy = DCAwareRoundRobinPolicy(local_dc='AWS_VPC_AP_SOUTHEAST_2')

conn = CassandraConnectionManager(
   settings = CassandraConnectionSettings(
       cluster_ips=["1.1.1.1", "2.2.2.2"],
       port=9042,
       load_balancing_policy=load_balancing_policy,
   )
)

conn = CassandraConnectionManager(
   CassandraConnectionSettings(
       cluster_ips=["1.1.1.1", "2.2.2.2"],
       port=9042,
       load_balancing_policy=load_balancing_policy,
       secrets_manager=CassandraSecretsManager(
       username_var="MY_CUSTOM_USERNAME_ENV_VAR"),
   )
)

# For running Cassandra model operations
conn.setup_connection("dev_space")
class ExampleModel(Model):
   example_type    = columns.Integer(primary_key=True)
   created_at      = columns.DateTime()
   description     = columns.Text(required=False)
sync_table(ExampleModel)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hip_data_tools-1.20.0.tar.gz (34.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hip_data_tools-1.20.0-py3-none-any.whl (40.6 kB view details)

Uploaded Python 3

File details

Details for the file hip_data_tools-1.20.0.tar.gz.

File metadata

  • Download URL: hip_data_tools-1.20.0.tar.gz
  • Upload date:
  • Size: 34.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.43.0 CPython/3.6.10

File hashes

Hashes for hip_data_tools-1.20.0.tar.gz
Algorithm Hash digest
SHA256 97b039c1f4c92c191b1cf896a5add9f97422e2017bf00bf7a6db77d815443842
MD5 abc8d84592f60d00464e4adcc21c73b7
BLAKE2b-256 e2f4557990ddd7249e06167445195af7fb2cf9da1b396ff81be6087b4eca8e39

See more details on using hashes here.

File details

Details for the file hip_data_tools-1.20.0-py3-none-any.whl.

File metadata

  • Download URL: hip_data_tools-1.20.0-py3-none-any.whl
  • Upload date:
  • Size: 40.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.43.0 CPython/3.6.10

File hashes

Hashes for hip_data_tools-1.20.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c0b24c7f089f5618bd3962c0d1d075760b38274fed68dd9dbe20d5825e25b8fb
MD5 69ac61e25285a3d07fb29c19b1f13460
BLAKE2b-256 aa5ab7c5f8fd292858dc9fa918ee42ecdce75f193003bf4be7b28fef3e2015d7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page