Encapsulating Apache Spark for Easy Usage

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Xursparks - XAIL's Apache Spark Framework

Overview

Welcome to the Xurpas AI Lab (XAIL) department's Apache Spark Framework. This framework is specifically designed to help XAIL developers implement Extract, Transform, Load (ETL) processes seamlessly and uniformly. Additionally, it includes integration capabilities with the Data Management and Configuration Tool (DMCT) to streamline your data workflows.

Introduction
Prerequisites
Installation
Usage
Best Practices
Contributing
Support
License

Introduction

This framework aims to provide a robust and standardized approach for XAIL developers to handle ETL processes using Apache Spark. By leveraging this framework, you can ensure that your data pipelines are efficient, maintainable, and easily integrable with the DMCT tool.

Prerequisites

Before you begin, ensure you have met the following requirements:

Apache Spark 3.0 or higher
Python 3.10 or higher
Access to the DMCT tool and relevant API keys

Installation

To use framework, follow these steps:

install xursparks in python env:

pip install xursparks

check if properly installed"

pip list

Usage

Setting Up Your Spark Application To start using the framework, create ETL Job as follows:

import xursparks

xursparks.initialize(args)

ETL Process Implementation

The framework provides predefined templates and utility functions to facilitate your ETL processes.

sourceTables = xursparks.getSourceTables()
sourceDataStorage = sourceTables.get("scheduled_manhours_ELE")
processDate = xursparks.getProcessDate()
sourceDataset = xursparks.loadSourceTable(dataStorage=sourceDataStorage,
												processDate=processDate)

Integration with DMCT

To integrate with the DMCT tool, ensure you have the required configurations set up in your application.properties file:

[default]
usage.logs=<usage logs>
global.config=<dmct global config api>
job.context=<dmct job context api>
api.token="dmct api token"

Best Practices

Always validate your data at each stage of the ETL process.

Leverage Spark's in-built functions and avoid excessive use of UDFs (User Defined Functions) for better performance.
Ensure proper error handling and logging to facilitate debugging.
Keep your ETL jobs modular and maintainable by adhering to the single responsibility principle.

Contributing

We welcome contributions to improve this framework. Please refer to the CONTRIBUTING.md file for guidelines on how to get started.

Support

If you encounter any issues or have questions, please reach out to the XAIL support team at support@xail.com.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Running Xursparks Job

SPARK-SUBMIT

spark-submit XurSparkSMain.py \
--master=local[*] \
--client-id=trami-data-folder \
--target-table=talentsolutions.candidate_reports \
--process-date=2023-05-24 \
--properties-file=job-application.properties \
--switch=1

Hadoop Sir Andy Setp

python AiLabsCandidatesDatamart.py \
--master=local[*] \
--deploy-mode=cluster \
--client-id=trami-data-folder \
--target-table=ailabs.candidates_transformed \
--process-date=2023-11-15 \
--properties-file=job-application.properties \
--switch=1

Hadoop

spark-submit \
--name AiLabsCandidatesDatamart \
--master yarn \
--jars aws-java-sdk-bundle-1.12.262.jar,hadoop-aws-3.3.4.jar \
--conf spark.yarn.dist.files=job-application.properties \
AiLabsCandidatesDatamart.py \
--keytab=hive.keytab \
--principal=hive/hdfscluster.local@HDFSCLUSTER.LOCAL \
--master=yarn \
--deploy-mode=cluster \
--client-id=trami-data-folder \
--target-table=ailabs.candidates_transformed \
--process-date=2023-11-16 \
--properties-file=job-application.properties \
--switch=1

Hadoop 3.3.2

spark-submit \
--name AiLabsCandidatesDatamart \
--master yarn \
--keytab hive.keytab \
--principal hive/hdfscluster.local@HDFSCLUSTER.LOCAL \
--jars aws-java-sdk-bundle-1.12.262.jar,hadoop-aws-3.3.4.jar,hive-jdbc-3.1.3.jar \
--conf spark.yarn.dist.files=job-application.properties \
AiLabsCandidatesDatamart.py \
--keytab=hive.keytab \
--principal=hive/hdfscluster.local@HDFSCLUSTER.LOCAL \
--master=yarn \
--deploy-mode=client \
--client-id=trami-data-folder \
--target-table=ailabs.candidates_transformed \
--process-date=2023-11-17 \
--properties-file=job-application.properties \
--switch=1

Hadoop testhdfs 3.3.2

spark-submit \
--name HdfsTest \
--master yarn \
--deploy-mode client \
--keytab hive.keytab \
--principal hive/hdfscluster.local@HDFSCLUSTER.LOCAL \
--jars aws-java-sdk-bundle-1.12.262.jar,hadoop-aws-3.3.4.jar \
--conf spark.yarn.dist.files=job-application.properties \
--driver-memory 4g \
--executor-memory 4g \
--executor-cores 2 \
HdfsTest.py \
--keytab=hive.keytab \
--principal=hive/hdfscluster.local@HDFSCLUSTER.LOCAL \
--master=yarn \
--deploy-mode=cluster \
--client-id=trami-data-folder \
--target-table=ailabs.candidates_transformed \
--process-date=2023-11-16 \
--properties-file=job-application.properties \
--switch=1

Hadoop

spark-submit \
--name AiLabsCandidatesDatamart \
--master yarn \
--jars aws-java-sdk-bundle-1.12.262.jar,hadoop-aws-3.3.4.jar,hive-jdbc-3.1.3.jar \
--conf spark.yarn.dist.files=job-application.properties \
AiLabsCandidatesDatamart.py \
--master=yarn \
--deploy-mode=client \
--client-id=trami-data-folder \
--target-table=ailabs.candidates_transformed \
--process-date=2023-11-19 \
--properties-file=job-application.properties \
--switch=1

Hadoop Employees

spark-submit \
--name AiLabsEmployeeDatamart \
--master yarn \
--keytab hive.keytab \
--principal hive/hdfscluster.local@HDFSCLUSTER.LOCAL \
--jars aws-java-sdk-bundle-1.12.262.jar,hadoop-aws-3.3.4.jar,hive-jdbc-3.1.3.jar,spark-excel_2.12-3.5.0_0.20.1.jar \
--conf spark.yarn.dist.files=job-application.properties \
AiLabsEmployeeDatamart.py \
--keytab=hive.keytab \
--principal=hive/hdfscluster.local@HDFSCLUSTER.LOCAL \
--master=yarn \
--deploy-mode=client \
--client-id=trami-data-folder \
--target-table=ailab.employees \
--process-date=2023-11-30 \
--properties-file=job-application.properties \
--switch=1

Hadoop Candidates

spark-submit \
--name AiLabsHdfsDatamart \
--master yarn \
--keytab hive.keytab \
--principal hive/hdfscluster.local@HDFSCLUSTER.LOCAL \
--jars aws-java-sdk-bundle-1.12.262.jar,hadoop-aws-3.3.4.jar,hive-jdbc-3.1.3.jar,spark-excel_2.12-3.5.0_0.20.1.jar \
--conf spark.yarn.dist.files=job-application.properties \
AiLabsHdfsDatamart.py \
--keytab=hive.keytab \
--principal=hive/hdfscluster.local@HDFSCLUSTER.LOCAL \
--master=yarn \
--deploy-mode=client \
--client-id=trami-data-folder \
--target-table=ailab.candidates_transformed_hdfs \
--process-date=2023-11-19 \
--properties-file=job-application.properties \
--switch=1

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.2.13.4

Apr 15, 2026

1.2.13.3

Apr 14, 2026

1.2.13.2

Apr 14, 2026

1.2.13.1

Mar 18, 2026

1.2.13.0

Mar 12, 2026

1.2.12.4.post6

Mar 11, 2026

1.2.12.4.post5

Mar 10, 2026

1.2.12.4.post1

Feb 25, 2026

1.2.12.1

Feb 12, 2026

1.2.12.post3

Feb 20, 2026

1.2.12.post2

Feb 20, 2026

1.2.12.post1

Feb 12, 2026

1.2.11

Nov 21, 2025

1.2.10

Nov 13, 2025

1.2.9

Nov 4, 2025

1.2.8

Oct 15, 2025

1.2.7

Aug 13, 2025

1.2.6

Jul 22, 2025

1.2.5

Jul 7, 2025

1.2.4

Jul 4, 2025

1.2.3

Jun 26, 2025

1.2.2

Jun 25, 2025

1.2.1

Jun 23, 2025

1.2.0

Jun 17, 2025

1.1.2

Nov 29, 2024

This version

1.1.1

Nov 28, 2024

1.1.0

Nov 28, 2024

1.0.17

Oct 10, 2024

1.0.16

Oct 9, 2024

1.0.15

Oct 4, 2024

1.0.14

Oct 4, 2024

1.0.13

Oct 4, 2024

1.0.12

Oct 4, 2024

1.0.11

Oct 2, 2024

1.0.10

Sep 20, 2024

1.0.9

May 17, 2024

1.0.8

May 17, 2024

1.0.7

May 17, 2024

1.0.6

May 15, 2024

1.0.5

May 6, 2024

1.0.4

May 6, 2024

1.0.3

May 2, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xursparks-1.1.1.tar.gz (25.5 kB view details)

Uploaded Nov 28, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

xursparks-1.1.1-py3-none-any.whl (30.4 kB view details)

Uploaded Nov 28, 2024 Python 3

File details

Details for the file xursparks-1.1.1.tar.gz.

File metadata

Download URL: xursparks-1.1.1.tar.gz
Upload date: Nov 28, 2024
Size: 25.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for xursparks-1.1.1.tar.gz
Algorithm	Hash digest
SHA256	`0778ce46d7b59977c0ee6c9ee7d3d53a684f2916d6d6cc2b7a0ce5f08108c8cd`
MD5	`2c40f84c380c99861493c4d3d083546f`
BLAKE2b-256	`4bd66f1383136e99f6d87df0ae472229d0dc45436cd205ca445b4036e9506a08`

See more details on using hashes here.

File details

Details for the file xursparks-1.1.1-py3-none-any.whl.

File metadata

Download URL: xursparks-1.1.1-py3-none-any.whl
Upload date: Nov 28, 2024
Size: 30.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for xursparks-1.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4f6800a1cebd2cc5e1f92734fff7571c961826efeb30fd83cbeb10bb29819cc5`
MD5	`bd3b5f6697e40361cf09497997556ef2`
BLAKE2b-256	`e851079e75886d9ea821bdfaf038e75ee162c3d4a5d2fad606f68c64054855a4`

See more details on using hashes here.

xursparks 1.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Xursparks - XAIL's Apache Spark Framework

Overview

Table of Contents

Introduction

Prerequisites

Installation

Usage

ETL Process Implementation

Integration with DMCT

Best Practices

Contributing

Support

License

Running Xursparks Job

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes