spark_datax_tools
Project description
spark_datax_tools
spark_datax_tools is a Python library that implements for dataX schemas
Installation
The code is packaged for PyPI, so that the installation consists in running:
pip install spark-datax-tools
Usage
wrapper take DataX
Nomenclature Datax
================================
table_name = "t_pmfi_lcl_suppliers_purchases"
origen = "host"
destination = "hdfs"
datax_generated_nomenclature(table_name=table_name,
origen=origen,
destination=destination,
output=True)
List of adaptaders
================================
datax_list_adapters()
Generated Ticket Adapter
============================================================
adapter_id = "ADAPTER_HDFS_OUTSTAGING"
parameter = {"uuaa":"na8z"}
datax_generated_ticket_adapter(adapter_id=adapter_id,
parameter=parameter,
is_dev=True
)
Generated Ticket Transfer
============================================================
folder="CR-PEMFIMEN-T02"
job_name="PMFITP4012"
crq="CRQ100000"
periodicity="mensual"
hour="10AM"
weight="50MB"
origen="host"
destination="hdfs"
datax_generated_ticket_transfer(
folder=folder,
job_name=job_name,
crq=crq,
periodicity=periodicity,
hour=hour,
weight=weight ,
table_name=table_name,
origen=origen,
destination=destination,
is_dev=True
)
Generated Schema JSON Artifactory
============================================================
path_json = "lclsupplierspurchases.output.schema"
is_schema_origen_in = True
schema_type = "host"
convert_string = False
datax_generated_schema_artifactory(
path_json=path_json,
is_schema_origen_in=schema_type,
schema_type=schema_type,
convert_string=convert_string
)
Generated Schema Json Datum
============================================================
spark = SparkSession.builder.master("local[*]").appName("SparkAPP").getOrCreate()
path="fields_pe_datum2.csv"
table_name="t_pmfi_lcl_suppliers_purchases"
origen="host"
destination="hdfs"
storage_zone="master"
datax_generated_schema_datum(
spark=spark,
path=path,
table_name=table_name,
origen=origen,
destination=destination,
storage_zone=storage_zone,
convert_string=False
)
License
New features v1.0
BugFix
- choco install visualcpp-build-tools
Reference
- Jonathan Quiza github.
- Jonathan Quiza RumiMLSpark.
- Jonathan Quiza linkedin.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
spark_datax_tools-0.7.0.tar.gz
(15.0 kB
view details)
Built Distribution
File details
Details for the file spark_datax_tools-0.7.0.tar.gz
.
File metadata
- Download URL: spark_datax_tools-0.7.0.tar.gz
- Upload date:
- Size: 15.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2285d4d891970032baff82807c07adf19c5e94bbe68fa4f3e2515fe1b2d2e489 |
|
MD5 | 754b9c4e51a6e1f404cd31be17c6f8d4 |
|
BLAKE2b-256 | ccf582de70567ca6c8db006176b9375ef37a4a78765f3031c751812d40d1d8c4 |
File details
Details for the file spark_datax_tools-0.7.0-py3-none-any.whl
.
File metadata
- Download URL: spark_datax_tools-0.7.0-py3-none-any.whl
- Upload date:
- Size: 17.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e070838e09be3ce342fb0d8751f3dd6f5f59b897dd4b483159f27b3da15206e0 |
|
MD5 | a61a87b75c8edbe72fd8e074e4d65234 |
|
BLAKE2b-256 | 5fa30884c84c36051491d140d7d401c01a710606564772b47bf72f5843e8cdc0 |