Skip to main content

spark_datax_tools

Project description

spark_datax_tools

Github License Updates Python 3 Code coverage

spark_datax_tools is a Python library that implements for dataX schemas

Installation

The code is packaged for PyPI, so that the installation consists in running:

pip install spark-datax-tools 

Usage

wrapper take DataX

Nomenclature Datax
================================
table_name = "t_pmfi_lcl_suppliers_purchases"
origen = "host"
destination = "hdfs"
datax_generated_nomenclature(table_name=table_name, 
                             origen=origen, 
                             destination=destination, 
                             output=True)




List of adaptaders
================================
datax_list_adapters()




Generated Ticket Adapter
============================================================
adapter_id = "ADAPTER_HDFS_OUTSTAGING"
parameter = {"uuaa":"na8z"}
datax_generated_ticket_adapter(adapter_id=adapter_id, 
                               parameter=parameter, 
                               is_dev=True
)
                               
                               
                               
Generated Ticket Transfer
============================================================
folder="CR-PEMFIMEN-T02"	
job_name="PMFITP4012"
crq="CRQ100000"
periodicity="mensual"
hour="10AM"
weight="50MB"
origen="host"
destination="hdfs"

datax_generated_ticket_transfer(
    folder=folder,	    
    job_name=job_name,    
    crq=crq,
    periodicity=periodicity,    
    hour=hour,    
    weight=weight	,    
    table_name=table_name,    
    origen=origen,
    destination=destination,
    is_dev=True
)
                               
     
                               
Generated Schema JSON Artifactory
============================================================
path_json = "lclsupplierspurchases.output.schema"
is_schema_origen_in = True
schema_type = "host"
convert_string = False

datax_generated_schema_artifactory( 
    path_json=path_json,
    is_schema_origen_in=schema_type,
    schema_type=schema_type,
    convert_string=convert_string
)
           
   
   
   
Generated Schema Json Datum
============================================================
spark = SparkSession.builder.master("local[*]").appName("SparkAPP").getOrCreate()
path="fields_pe_datum2.csv"
table_name="t_pmfi_lcl_suppliers_purchases"
origen="host"
destination="hdfs"
storage_zone="master"

datax_generated_schema_datum(
    spark=spark,
    path=path,
    table_name=table_name,
    origen=origen,
    destination=destination,
    storage_zone=storage_zone,
    convert_string=False
)
  

License

Apache License 2.0.

New features v1.0

BugFix

  • choco install visualcpp-build-tools

Reference

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spark_datax_tools-0.7.0.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

spark_datax_tools-0.7.0-py3-none-any.whl (17.3 kB view details)

Uploaded Python 3

File details

Details for the file spark_datax_tools-0.7.0.tar.gz.

File metadata

  • Download URL: spark_datax_tools-0.7.0.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.1

File hashes

Hashes for spark_datax_tools-0.7.0.tar.gz
Algorithm Hash digest
SHA256 2285d4d891970032baff82807c07adf19c5e94bbe68fa4f3e2515fe1b2d2e489
MD5 754b9c4e51a6e1f404cd31be17c6f8d4
BLAKE2b-256 ccf582de70567ca6c8db006176b9375ef37a4a78765f3031c751812d40d1d8c4

See more details on using hashes here.

File details

Details for the file spark_datax_tools-0.7.0-py3-none-any.whl.

File metadata

File hashes

Hashes for spark_datax_tools-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e070838e09be3ce342fb0d8751f3dd6f5f59b897dd4b483159f27b3da15206e0
MD5 a61a87b75c8edbe72fd8e074e4d65234
BLAKE2b-256 5fa30884c84c36051491d140d7d401c01a710606564772b47bf72f5843e8cdc0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page