Library to help ETL using pyspark
Project description
Sparta
Library to help ETL using Pyspark.
Sparta is a simple library to help you work on ETL builds using PySpark.
Important Sources
Installation
Install the latest version with pip install pysparta
Documentation
Modules
Extract
This is a module with functions for extracting and reading data.
Example
from sparta.extract import read_with_schema
schema = 'epidemiological_week LONG, date DATE, order_for_place INT, state STRING, city STRING, city_ibge_code LONG, place_type STRING, last_available_confirmed INT'
path = '/content/sample_data/covid19-e0534be4ad17411e81305aba2d9194d9.csv'
df = read_with_schema(path, schema, {'header': 'true'}, 'csv')
Transformation
This is a module with data transformation functions
Example
from sparta.transformation import drop_duplicates
cols = ['longitude','latitude']
df = drop_duplicates(df, 'population', cols)
Load
This is a module with load and write functions.
Example
from sparta.load import create_hive_table
create_hive_table(df, "table_name", 5, "col1", "col2", "col3")
Others
This is a module with several functions that can help in ETL work.
Example
from sparta.secret import get_secret_aws
get_secret_aws('Nome_Secret', 'sa-east-1')
Supported PySpark / Python versions
Sparta currently supports PySpark 3.0+ and Python 3.7+.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pysparta-0.5.6.tar.gz.
File metadata
- Download URL: pysparta-0.5.6.tar.gz
- Upload date:
- Size: 21.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41dbbc9b00cc9d7fda7007f200db46a1cc1f4090b2a6d4b9682d2266c296740f
|
|
| MD5 |
679e14a3665ad0e026351f33a870ad8c
|
|
| BLAKE2b-256 |
1073db5eefadd41ee7713aa7aef2f4cd34e1ec371b12aaa5a9375874c5df7aac
|
File details
Details for the file pysparta-0.5.6-py3-none-any.whl.
File metadata
- Download URL: pysparta-0.5.6-py3-none-any.whl
- Upload date:
- Size: 24.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e8d0106bfed06873dcbd405eb43b323b21a9ddc50f21598fde0199ed2d0b171
|
|
| MD5 |
5c06767948c9ee67bbc40a279459d091
|
|
| BLAKE2b-256 |
4abf1ee2defecf34d6e9195410fe67073fe99ae2eaea738064e2a82a3a908dbd
|