Skip to main content

Data Preparation Toolkit Library for Spark

Project description

Spark Data Processing Library

This provides a python framework for developing transforms on data stored in files - currently parquet files are supported - and running them in a Spark cluster. Data files may be stored in the local file system or COS/S3. For more details see the documentation.

Virtual Environment

The project uses pyproject.toml and a Makefile for operations. To do development you should establish the virtual environment

make venv

and then either activate

source venv/bin/activate

or set up your IDE to use the venv directory when developing in this project

Library Artifact Build and Publish

To test, build and publish the library

make test build publish

To up the version number, edit the Makefile to change VERSION and rerun the above. This will require committing both the Makefile and the automatically updated pyproject.toml file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_prep_toolkit_spark-0.2.0.tar.gz (30.9 kB view details)

Uploaded Source

Built Distribution

data_prep_toolkit_spark-0.2.0-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file data_prep_toolkit_spark-0.2.0.tar.gz.

File metadata

File hashes

Hashes for data_prep_toolkit_spark-0.2.0.tar.gz
Algorithm Hash digest
SHA256 facfbb35edb9c926b15bd4dc5450ca668d52cf5619e67338cd4b1fa2b6b3558f
MD5 380f5f1369a86f98e3afcb8693016fbf
BLAKE2b-256 3770a664b8cc3a1f97f81b68811df23fc25f281a0c106ba9e9a2236da3acc0e8

See more details on using hashes here.

File details

Details for the file data_prep_toolkit_spark-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for data_prep_toolkit_spark-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f11d2e5deab3453a2a2054b5501f7865c29b0aa2f93a160201842d52e433dd83
MD5 50750a9ea13aa85755ca6349e5022bee
BLAKE2b-256 28277d258edc44d6187f1686020907e492a0af110a1aa98b3a08c8fe90d511dd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page