Skip to main content

Library to help ETL using pyspark

Project description

Sparta

Library to help ETL using Pyspark.

Sparta is a simple library to help you work on ETL builds using PySpark.

Important Sources

Installation

Install the latest version with pip install pysparta

Documentation

Sparta

Modules

Extract

This is a module with functions for extracting and reading data.

Example

from sparta.extract import read_with_schema

schema = 'epidemiological_week LONG, date DATE, order_for_place INT, state STRING, city STRING, city_ibge_code LONG, place_type STRING, last_available_confirmed INT'
path = '/content/sample_data/covid19-e0534be4ad17411e81305aba2d9194d9.csv'
df = read_with_schema(path, schema, {'header': 'true'}, 'csv')

Transformation

This is a module with data transformation functions

Example

from sparta.transformation import drop_duplicates

cols = ['longitude','latitude']
df = drop_duplicates(df, 'population', cols)

Load

This is a module with load and write functions.

Example

from sparta.load import create_hive_table

create_hive_table(df, "table_name", 5, "col1", "col2", "col3")

Others

This is a module with several functions that can help in ETL work.

Example

from sparta.secret import get_secret_aws

get_secret_aws('Nome_Secret', 'sa-east-1')

Supported PySpark / Python versions

Sparta currently supports PySpark 3.0+ and Python 3.7+.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysparta-0.5.6.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pysparta-0.5.6-py3-none-any.whl (24.9 kB view details)

Uploaded Python 3

File details

Details for the file pysparta-0.5.6.tar.gz.

File metadata

  • Download URL: pysparta-0.5.6.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.7

File hashes

Hashes for pysparta-0.5.6.tar.gz
Algorithm Hash digest
SHA256 41dbbc9b00cc9d7fda7007f200db46a1cc1f4090b2a6d4b9682d2266c296740f
MD5 679e14a3665ad0e026351f33a870ad8c
BLAKE2b-256 1073db5eefadd41ee7713aa7aef2f4cd34e1ec371b12aaa5a9375874c5df7aac

See more details on using hashes here.

File details

Details for the file pysparta-0.5.6-py3-none-any.whl.

File metadata

  • Download URL: pysparta-0.5.6-py3-none-any.whl
  • Upload date:
  • Size: 24.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.7

File hashes

Hashes for pysparta-0.5.6-py3-none-any.whl
Algorithm Hash digest
SHA256 8e8d0106bfed06873dcbd405eb43b323b21a9ddc50f21598fde0199ed2d0b171
MD5 5c06767948c9ee67bbc40a279459d091
BLAKE2b-256 4abf1ee2defecf34d6e9195410fe67073fe99ae2eaea738064e2a82a3a908dbd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page