Skip to main content

Facilitate data engineering on the Ingenii Data Platform

Project description

Ingenii Data Engineering Package

Maintainer License Contributing

Details

  • Current Version: 0.2.1

Overview

This package provides utilities for data engineering on Ingenii's Azure Data Platform. This can be both used for local development, and is used in the Ingenii Databricks Runtime.

Usage

Import the package to use the functions within.

import ingenii_data_engineering

dbt

See details of how we validate dbt schemas in the dbt README file

Pre-processing

See details of working with pre-processing functions in the pre-processing README file

Development

Prerequisites

  1. A working knowledge of git SCM
  2. Installation of Python 3.7.3

Set up

  1. Complete the 'Getting Started > Prerequisites' section
  2. For Windows only:
  3. Run make setup: to copy the .env into place (.env-dist > .env)

Getting started

  1. Complete the 'Getting Started > Set up' section

  2. From the root of the repository, in a terminal (preferably in your IDE) run the following commands to set up a virtual environment:

    python -m venv venv
    . venv/bin/activate
    pip install -r requirements-dev.txt
    pre-commit install
    

    or for Windows:

    python -m venv venv
    . venv/Scripts/activate
    pip install -r requirements-dev.txt
    pre-commit install
    
  3. Note: if you get a permission denied error when executing the pre-commit install command you'll need to run chmod -R 775 venv/bin/ to recursively update permissions in the venv/bin/ dir

  4. The following checks are run as part of pre-commit hooks: flake8(note unit tests are not run as a hook)

Building

  1. Complete the 'Getting Started > Set up' section
  2. Run make build to create the package in ./dist
  3. Run make clean to remove dist files

Testing

  1. Complete the 'Getting Started > Set up' and 'Development' sections
  2. Run make test to run the unit tests using pytest
  3. Run flake8 to run lint checks using flake8
  4. Run make qa to run the unit tests and linting in a single command
  5. Run make qa to remove pytest files

Version History

  • 0.2.1: Handle JSON read UTF-8 BOM
  • 0.2.0: Pre-processing happens all in the 'archive' container
  • 0.1.5: Better functionality for column names in .csv files
  • 0.1.4: Handle JSON files
  • 0.1.3: Adding pre-processing utilities
  • 0.1.2: Rearrangement and better split of work with the Databricks Runtime. Better validation
  • 0.1.1: Minor bug fixes
  • 0.1.0: dbt schema validation, pre-processing class

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ingenii_data_engineering-0.2.1.tar.gz (14.6 kB view hashes)

Uploaded Source

Built Distribution

ingenii_data_engineering-0.2.1-py3-none-any.whl (15.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page