spark_hdfs_tools
Project description
spark_hdfs_tools
spark_hdfs_tools is a Python library that implements hdfs filesystem in sandbox
Installation
The code is packaged for PyPI, so that the installation consists in running:
Usage
wrapper run hdfs filesystem
Sandbox
Installation
!yes| pip uninstall spark-hdfs-tools
pip install spark-hdfs-tools --user --upgrade
IMPORTS
import os
import pyspark
from spark_hdfs_tools import dq_path_workspace
from spark_hdfs_tools import dq_download_jar
from spark_hdfs_tools import dq_spark_session
Variables
project_sda="SDA_37036"
url_conf = "http://artifactory-gdt.central-02.nextgen.igrupobbva/artifactory/gl-datio-spark-libs-maven-local/com/datiobd/cdd-hammurabi/4.0.9/DQ_LOCAL_CONFS/KCOG/KCOG_branch_MRField.conf"
Creating Workspace
dq_path_workspace(project_sda=project_sda)
Download haas jar
dq_download_jar(haas_version="4.8.0", force=True)
Spark Session
spark, sc = dq_spark_session()
Validate Conf
dq_validate_conf(url_conf=url_conf)
Extract Params
dq_extract_parameters(url_conf=url_conf)
Run
dq_run_sandbox(spark=spark,
sc=sc,
parameter_conf_list=parameter_conf_list,
url_conf=url_conf)
License
New features v1.0
BugFix
- choco install visualcpp-build-tools
Reference
- Jonathan Quiza github.
- Jonathan Quiza RumiMLSpark.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
spark_hdfs_tools-0.1.1.tar.gz
(7.4 kB
view hashes)
Built Distribution
Close
Hashes for spark_hdfs_tools-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ccc612e9c22391614ab48bc4115e90829dcda6ef354f50aff90377441ee05ef5 |
|
MD5 | 24f601c0a934b6088d8766300f7612ad |
|
BLAKE2b-256 | 8efdeac587910119923aff36cae920a913ccd30d607ec7fe2564358f18ea9000 |