Dumb Spark Utlity for personal use
Project description
# SeedSpark
[![BuildAndTest](https://github.com/ChethanUK/seedspark/actions/workflows/build_test.yml/badge.svg)](https://github.com/ChethanUK/seedspark/actions/workflows/build_test.yml) [![PreCommitChecks](https://github.com/ChethanUK/seedspark/actions/workflows/code_quality_lint_checkers.yml/badge.svg)](https://github.com/ChethanUK/seedspark/actions/workflows/code_quality_lint_checkers.yml) [![CodeQL](https://github.com/ChethanUK/seedspark/actions/workflows/codeql-analysis.yml/badge.svg)](https://github.com/ChethanUK/seedspark/actions/workflows/codeql-analysis.yml) [![codecov](https://codecov.io/gh/ChethanUK/seedspark/branch/main/graph/badge.svg?token=HRI9hoE5ru)](https://codecov.io/gh/ChethanUK/seedspark)
Dumb Spark Package
NOTE: It's just curated stuff in this repo for personal usage.
## TODO
1. Move logwrap [on top of loguru] extension out as a seperate package.
1. Add Test containers for [amundsen](https://www.amundsen.io/amundsen/), etc..
## Getting Started
1. Setup [SDKMAN](#setup-sdkman)
1. Setup [Java](#setup-java)
1. Setup [Apache Spark](#setup-apache-spark)
1. Install [Poetry](#poetry)
1. Install Pre-commit and [follow instruction in here](PreCommit.MD)
1. Run [tests locally](#running-tests-locally)
### Setup SDKMAN
SDKMAN is a tool for managing parallel Versions of multiple Software Development Kits on any Unix based system. It provides a convenient command line interface for installing, switching, removing and listing Candidates.
SDKMAN! installs smoothly on Mac OSX, Linux, WSL, Cygwin, etc... Support Bash and ZSH shells.
See documentation on the [SDKMAN! website](https://sdkman.io).
Open your favourite terminal and enter the following:
```bash
$ curl -s https://get.sdkman.io | bash
If the environment needs tweaking for SDKMAN to be installed,
the installer will prompt you accordingly and ask you to restart.
Next, open a new terminal or enter:
$ source "$HOME/.sdkman/bin/sdkman-init.sh"
Lastly, run the following code snippet to ensure that installation succeeded:
$ sdk version
```
### Setup Java
Install Java Now open favourite terminal and enter the following:
```bash
List the AdoptOpenJDK OpenJDK versions
$ sdk list java
To install For Java 11
$ sdk install java 11.0.10.hs-adpt
To install For Java 11
$ sdk install java 8.0.292.hs-adpt
```
### Setup Apache Spark
Install Java Now open favourite terminal and enter the following:
```bash
List the Apache Spark versions:
$ sdk list spark
To install For Spark 3
$ sdk install spark 3.0.2
To install For Spark 3.1
$ sdk install spark 3.0.2
```
### Poetry
Poetry [Commands](https://python-poetry.org/docs/cli/#search)
```bash
poetry install
poetry update
# --tree: List the dependencies as a tree.
# --latest (-l): Show the latest version.
# --outdated (-o): Show the latest version but only for packages that are outdated.
poetry show -o
```
## Running Tests Locally
Take a look at tests in `tests/dataquality` and `tests/jobs`
```bash
$ poetry run pytest
Ran 95 tests in 96.95s
```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
seedspark-0.1.4-py3-none-any.whl
(22.8 kB
view hashes)
Close
Hashes for seedspark-0.1.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 111e770cea6c4d4dacf12c721b1527def1f0fc10ef6376f6edd50d7b690d64c0 |
|
MD5 | 07cf2fe717e7243959cd8e8406002173 |
|
BLAKE2b-256 | e2b22b83d7e1102f22782db5cda2a50154c22cfe2de3d1430a6508d30f0ae4e1 |