Skip to main content

A set of utilities for DCI jobs

Project description

dci_utils
=========
This package collects some useful classes to promote code reuse across ETL jobs used in DCI

Class
-------------
#### Logger

- provides a log method to log various milestones during the job execution.

```python
# credentials (dict, optional): AWS Credentials used to access CloudWatch.
# If not specified, defaults to the computer's role
# log_group_name (str): Name of the AWS Log Groupself.
# Must be the name of the job being executed.
# region (str, optional): AWS region where logs are recorded.
# If not specified us-east-1 is assumed as default.
def __init__(self, credentials=None, log_group_name, region='us-east-1'):


# message : text to be logged
def log(self, message):
```

#### MetricRecorder

- provides a record method to push metric to cloudwatch during the job execution.

```python
# credentials (dict): AWS Credentials used to access CloudWatch.
# namespace (str): Name of the AWS Metric Custom Namespace.
# region (str, optional): AWS region where logs are recorded.
# If not specified, us-east-1 is assumed as default.
def __init__(self, credentials, namespace, region='us-east-1'):


# metric_name (str): The name of the AWS metric.
# value (str): Actual value of the AWS metric.
# metric_dims (list, optional): A list of dimensions associated wit the data.
# each dimension is a dict Name - Value
# If not specified, empty list [] is assumed as default.
# metric_unit (str, optional): Unit of the AWS metric.
# If not specified, Count is assumed as default.
def record(self, metric_name, value, metric_dims=None, metric_unit='Count'):
```

Usage
-------------

```python
from pyspark.sql import SparkSession

import boto3

spark = SparkSession.builder.enableHiveSupport() \
.appName("<application_name>").getOrCreate()

spark.sparkContext.addPyFile('s3://path/to/file/aws_cloudwatch_utils.py')

import aws_cloudwatch_utils

job_name = '<job_name>'

role = 'arn:aws:iam::<aws_account>:role/<aws_role_name>'
dims = [{'Name': 'JobName', 'Value': job_name}]
sts = boto3.client('sts')
credentials = sts.assume_role(RoleArn=role, RoleSessionName='<job_name>')['Credentials']

logger = aws_cloudwatch_utils.Logger(credentials, '<job_name>')
metric_recorder = aws_cloudwatch_utils.MetricRecorder(credentials, '<job_name>')
logger.log("Job Completed Successfully")
metric_recorder.record('Success', 1, dims, 'Count')
```

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dci_utils-0.0.21.tar.gz (1.7 kB view details)

Uploaded Source

File details

Details for the file dci_utils-0.0.21.tar.gz.

File metadata

  • Download URL: dci_utils-0.0.21.tar.gz
  • Upload date:
  • Size: 1.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/36.2.7 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.6.5

File hashes

Hashes for dci_utils-0.0.21.tar.gz
Algorithm Hash digest
SHA256 52b2a5aa801fb235f6b3663779e25d5a536233fa668a3538fbdae55b518b0d8d
MD5 fc3ce1e6696771b7775ecccfe8b6d100
BLAKE2b-256 30bb277eef18ae86df00f1f1c62e5c0cbe1643088fb669d97af20d0f14b5f4be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page