A Python Package for interacting with Cloudera Data Engineering Clusters
Project description
cdepy Package
cdepy is a package for interacting with Cludera Data Engineering Virtual Clusters.
You can find out more about Cloudera Data Engineering in the Cloudera Documentation.
Usage
You can install this package using
pip install cdepy
Features
- CDE Resources: create resources of type Files and Python-Environment
- CDE Jobs: create jobs of type Airflow and Spark
- Job Observability: monitor job status
Examples
from cdepy import cdeconnection
from cdepy import cdejob
from cdepy import cdemanager
from cdepy import cderesource
Establish Connection to CDE Virtual Cluster
JOBS_API_URL = "https://<YOUR-CLUSTER>.cloudera.site/dex/api/v1"
WORKLOAD_USER = "<Your-CDP-Workload-User>"
WORKLOAD_PASSWORD = "<Your-CDP-Workload-Password>"
myCdeConnection = cdeconnection.CdeConnection(JOBS_API_URL, WORKLOAD_USER, WORKLOAD_PASSWORD)
myCdeConnection.setToken()
Create CDE Files Resource Definition
CDE_RESOURCE_NAME = "myFilesCdeResource"
myCdeFilesResource = cderesource.CdeFilesResource(CDE_RESOURCE_NAME)
myCdeFilesResourceDefinition = myCdeFilesResource.createResourceDefinition()
Create a CDE Spark Job Definition
CDE_JOB_NAME = "myCdeSparkJob"
APPLICATION_FILE_NAME = "pysparksql.py"
myCdeSparkJob = cdejob.CdeSparkJob(myCdeConnection)
myCdeSparkJobDefinition = myCdeSparkJob.createJobDefinition(CDE_JOB_NAME, CDE_RESOURCE_NAME, APPLICATION_FILE_NAME, executorMemory="2g", executorCores=2)
Create Resource and Job in CDE Cluster
LOCAL_FILE_PATH = "examples"
LOCAL_FILE_NAME = "pysparksql.py"
myCdeClusterManager = cdemanager.CdeClusterManager(myCdeConnection)
myCdeClusterManager.createResource(myCdeFilesResourceDefinition)
myCdeClusterManager.uploadFile(CDE_RESOURCE_NAME, LOCAL_FILE_PATH, LOCAL_FILE_NAME)
myCdeClusterManager.createJob(myCdeSparkJobDefinition)
Run Job with Default Configurations
myCdeClusterManager.runJob(CDE_JOB_NAME)
Update Runtime Configurations
overrideParams = {"spark": {"executorMemory": "4g"}}
myCdeClusterManager.runJob(CDE_JOB_NAME, SPARK_OVERRIDES=overrideParams)
Validate Job Runs
jobRuns = myCdeClusterManager.listJobRuns()
json.loads(jobRuns)
Download Spark Event Logs
JOB_RUN_ID = "1"
logTypes = myCdeClusterManager.showAvailableLogTypes(JOB_RUN_ID)
json.loads(logTypes)
LOGS_TYPE = "driver/event"
sparkEventLogs = myCdeClusterManager.downloadJobRunLogs(JOB_RUN_ID, LOGS_TYPE)
sparkEventLogsClean = sparkEventLogParser(sparkEventLogs)
print(sparkEventLogsClean)
Delete Job and Validate Deletion
CDE_JOB_NAME = "myCdeSparkJob"
myCdeClusterManager.deleteJob(CDE_JOB_NAME)
myCdeClusterManager.listJobs()
Describe Cluster Meta
myCdeClusterManager.describeResource(CDE_RESOURCE_NAME)
Remove Files from Files Resource
RESOURCE_FILE_NAME = "pysparksql.py"
myCdeClusterManager.removeFileFromResource(CDE_RESOURCE_NAME, RESOURCE_FILE_NAME)
Upload File to Resource
myCdeClusterManager.uploadFileToResource(CDE_RESOURCE_NAME, LOCAL_FILE_PATH, LOCAL_FILE_NAME)
Download File from Resource
myPySparkScript = myCdeClusterManager.downloadFileFromResource(CDE_RESOURCE_NAME, RESOURCE_FILE_NAME)
from pprint import pprint
pprint(myPySparkScript)
Pause Single Job
myCdeClusterManager.pauseSingleJob(CDE_JOB_NAME)
Delete Resource
CDE_RESOURCE_NAME = "myFilesCdeResource"
myCdeClusterManager.deleteResource(CDE_RESOURCE_NAME)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cdepy-0.1.7.tar.gz
(9.3 kB
view details)
Built Distribution
cdepy-0.1.7-py3-none-any.whl
(9.9 kB
view details)
File details
Details for the file cdepy-0.1.7.tar.gz
.
File metadata
- Download URL: cdepy-0.1.7.tar.gz
- Upload date:
- Size: 9.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | efe307c88fa3480c63a429276c9231cbbf754eea0ec8a8c7e90eca3524e2405a |
|
MD5 | 3bf26750f50b594bc962db7890661500 |
|
BLAKE2b-256 | 67487a1b07dcfe45fc2f67d2af818b122aba15624d76bee8c5738a78e70f0f3a |
File details
Details for the file cdepy-0.1.7-py3-none-any.whl
.
File metadata
- Download URL: cdepy-0.1.7-py3-none-any.whl
- Upload date:
- Size: 9.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1fc2f9b5e67f0f64cf455b159ce94ebba0a22adec4243a982ac835922e5fdd2a |
|
MD5 | 5fe5a0ab193b2e055aff1ff8c8219c6c |
|
BLAKE2b-256 | 522dcf83eb8b52247c1961e47f42fef36021bb1000c91dcb22b06baec251bc17 |