Skip to main content

General purpose data engineering python package.

Project description

# Data Engine

Data Engine is a general purpose python package for data engineering that leverages python, pandas, Apache Spark, and public cloud services.

[![codecov](https://codecov.io/gh/leealessandrini/dataengine/branch/main/graph/badge.svg)](https://codecov.io/gh/leealessandrini/dataengine)

## Usage Guide

### Installing the Package

To use dataengine, you can install it directly from PyPI:

`bash pip install dataengine `

### Adding to Your Project’s Requirements

If you want to include dataengine as a dependency for your project, add it to your requirements.txt file:

`bash dataengine `

Then, install the updated requirements:

`bash pip install -r requirements.txt `

Once installed, you can start using dataengine in your Python scripts or applications.

### How to use the package

In dataengine the primary class that will drive everything is the Engine class. For each instance of Engine you can specify the different subclasses through a configuration file or files. The different subclasses include the Database, Dataset, and Query classes.

  • An instance of Database defines a single database you will interact with

  • An instance of Dataset defines where to load data from using Apache spark (it can be locally or in s3)

  • An instance of Query defines what datasets are input dependencies, what the SQL statments you would like to run are, where the output of the query will be saved, and if the result should be inserted into a database along with custom parameters for each

TODO: Add explicit example here.

## Development

In order to contribute to the project follow the following instructions.

### Setup Instructions

The following steps will help you clone the repository and setup your environment.

  1. Clone the repository:

`bash git clone https://github.com/leealessandrini/dataengine.git cd dataengine `

  1. Create and activate a virtual environment:

`bash python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate `

  1. Install the required dependencies:

`bash pip install -r requirements.txt `

You’re now ready to start contributing the project!

### Contribution Guide

  1. Create or check out a new branch:

`bash git checkout -b your-feature-branch ` Replace your-feature-branch with a descriptive name for your branch.

  1. Make changes to the code and stage the changes:

`bash git add . `

  1. Commit your changes with a meaningful message:

`bash git commit -m "Add a concise description of your changes" `

  1. Push your branch to the remote repository:

`bash git push origin your-feature-branch `

  1. Create a pull request:

  • Go to your repository on GitHub.

  • Navigate to the “Pull Requests” tab.

  • Click “New Pull Request” and select your branch to merge with the main branch.

  • Add a descriptive title and details about your changes, then submit the pull request.

  1. Merge the pull request (once reviewed and approved):

  • If you have the required permissions, merge the pull request.

  • Otherwise, wait for a project maintainer to review and merge it.

Congratulations! 🎉 You’ve successfully contributed to the project!

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataengine-0.0.92.tar.gz (173.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataengine-0.0.92-py3-none-any.whl (180.3 kB view details)

Uploaded Python 3

File details

Details for the file dataengine-0.0.92.tar.gz.

File metadata

  • Download URL: dataengine-0.0.92.tar.gz
  • Upload date:
  • Size: 173.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dataengine-0.0.92.tar.gz
Algorithm Hash digest
SHA256 1d28293665000f6a6a0b66c6d815f0be1c0472fcc93f4ecfff17c5f0c79fcce9
MD5 8aa13c9a279dc157fcc1c9acee6bb8fb
BLAKE2b-256 c640b7716307df69e973dd25639098e46c38d870433dc57f69a094128efd0802

See more details on using hashes here.

File details

Details for the file dataengine-0.0.92-py3-none-any.whl.

File metadata

  • Download URL: dataengine-0.0.92-py3-none-any.whl
  • Upload date:
  • Size: 180.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dataengine-0.0.92-py3-none-any.whl
Algorithm Hash digest
SHA256 b52ea3d56509d8a69d34b641ee8bca7ee6eaa90b8e569925fcd7f6e6de0d1df8
MD5 d03bca6b728ca3d2e8480f1f918196ea
BLAKE2b-256 b0bd5481a1bc9a56a81fd9e68395f4b95ed787fd75c81d5a059883d6b8e3b53e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page