Skip to main content

A set of utilities for creating and managing ETL Pipelines with pyspark.

Project description

Jorvik

Jorvik is a collection of utilities for creating and managing ETL pipeline in pyspark.

Packages

  • Storage: Interact with the storage layer
  • Pipelines: Build and test etl pipelines with ease

How to Contribute

The Jorvik project welcomes your expertise and enthusiasm!

Writing code isn’t the only way to contribute. You can also:

  • review pull requests
  • suggest improvements through issues
  • help us stay on top of new and old issues
  • develop tutorials, presentations, and other educational materials

Contributing Code

You will need your own copy of jorvik (aka fork) to work on the code. Clone the forked repository locally add your changes and create a Pull Request from the forked repo to jorvik.

To setup your machine:

  • fork the repository Go to https://github.com/jorvik-io/jorvik and click the fork button. This will create a copy of jorvik in your Github account https://github.com/your-username/jorvik

  • clone your fork in your machine

    git clone https://github.com/your-username/jorvik.git
    
  • add a reference to jorvik-io/jorvik to easily fetch updates

    git remote add jorvik https://github.com/jorvik-io/jorvik.git
    
  • check your setup

    git remote -v
    

    You should expect to see 2 remote references origin that points to your account and jorvik that point to jorvik-io

To create a Pull Request and submit code:

  • checkout main branch
  • take the latest changes from jorvik, see also this article
    git pull jorvik main
    
  • create a new branch
    git checkout -b feature-branch
    
  • commit and push your changes
    git add .
    git commit -m 'Your commit message'
    git push --set-upstream origin feature-branch
    
  • create a Pull Request from your fork to jorvik

Click here for more information about contributing to open source projects.

Development

NOTE: JAVA 11 or JAVA 17 is required. On a Mac you can install with brew install openjdk@17.

Setup the package in editable mode including the dependencies needed for testing. pip install -e '.[tests]'

Editor

VS Code is the recommended editor and the project comes with the VScode settings that follow the project guidelines. See .vscode/settings.json.

Recommended extensions:

  • python
  • autopep8
  • Flake8
  • isort
  • Code Spell Checker

Testing

You can run the tests by running the command pytest test.

To run the tests in VS code you may need to point to the correct Java version in VScode's python context. To do so add .env file in the root folder and include the JAVA_HOME environment variable for example JAVA_HOME=/opt/homebrew/opt/openjdk@17.

Linting

The project enforces flake8 rules with the following exceptions: E302, E305: Expected 2 blank lines max line length: 127

To ignore flake8 errors you can add the following comment in the affected code line # noqa: ERRORCODE.

Spell checks

Sometimes spelling mistakes cannot be avoided. For example the spelling mistake is a function from a dependent library. you can ignore spelling mistakes by adding the comment # cspell: words word1 word2 in the top of the file. You can ignore the words by adding them in cSpell.json.

Dev Container

Set up dev environment using Dev container in vscode

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jorvik-1.0.0-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file jorvik-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: jorvik-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 19.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.8

File hashes

Hashes for jorvik-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8b05cf487cba317b79e779187f54d82ecdafdec5568de9b846f1dcf5f6b8eeff
MD5 1f84331f5fd046a9d517efd90e23a53d
BLAKE2b-256 de65621764aa5dcde33448f15c966b0a869a8b8a7cb2dafb06866af321ff1ea2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page