A set of utilities for creating and managing ETL Pipelines with pyspark.
Project description
Jorvik
Jorvik is a collection of utilities for creating and managing ETL pipeline in pyspark.
Packages
How to Contribute
The Jorvik project welcomes your expertise and enthusiasm!
Writing code isn’t the only way to contribute. You can also:
- review pull requests
- suggest improvements through issues
- help us stay on top of new and old issues
- develop tutorials, presentations, and other educational materials
Contributing Code
You will need your own copy of jorvik (aka fork) to work on the code. Clone the forked repository locally add your changes and create a Pull Request from the forked repo to jorvik.
To setup your machine:
-
fork the repository Go to https://github.com/jorvik-io/jorvik and click the fork button. This will create a copy of jorvik in your Github account https://github.com/your-username/jorvik
-
clone your fork in your machine
git clone https://github.com/your-username/jorvik.git
-
add a reference to jorvik-io/jorvik to easily fetch updates
git remote add jorvik https://github.com/jorvik-io/jorvik.git
-
check your setup
git remote -v
You should expect to see 2 remote references origin that points to your account and jorvik that point to jorvik-io
To create a Pull Request and submit code:
- checkout main branch
- take the latest changes from jorvik, see also this article
git pull jorvik main
- create a new branch
git checkout -b feature-branch
- commit and push your changes
git add . git commit -m 'Your commit message' git push --set-upstream origin feature-branch
- create a Pull Request from your fork to jorvik
Click here for more information about contributing to open source projects.
Development
NOTE: JAVA 11 or JAVA 17 is required. On a Mac you can install with brew install openjdk@17.
Setup the package in editable mode including the dependencies needed for testing.
pip install -e '.[tests]'
Editor
VS Code is the recommended editor and the project comes with the VScode settings that follow the project guidelines. See .vscode/settings.json.
Recommended extensions:
- python
- autopep8
- Flake8
- isort
- Code Spell Checker
Testing
You can run the tests by running the command pytest test.
To run the tests in VS code you may need to point to the correct Java version in VScode's python context. To do so add .env file in the root folder and include the JAVA_HOME environment variable for example JAVA_HOME=/opt/homebrew/opt/openjdk@17.
Linting
The project enforces flake8 rules with the following exceptions: E302, E305: Expected 2 blank lines max line length: 127
To ignore flake8 errors you can add the following comment in the affected code line # noqa: ERRORCODE.
Spell checks
Sometimes spelling mistakes cannot be avoided. For example the spelling mistake is a function from a dependent library. you can ignore spelling mistakes by adding the comment # cspell: words word1 word2 in the top of the file. You can ignore the words by adding them in cSpell.json.
Dev Container
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jorvik-1.0.0-py3-none-any.whl.
File metadata
- Download URL: jorvik-1.0.0-py3-none-any.whl
- Upload date:
- Size: 19.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b05cf487cba317b79e779187f54d82ecdafdec5568de9b846f1dcf5f6b8eeff
|
|
| MD5 |
1f84331f5fd046a9d517efd90e23a53d
|
|
| BLAKE2b-256 |
de65621764aa5dcde33448f15c966b0a869a8b8a7cb2dafb06866af321ff1ea2
|