General purpose data engineering python package.
Project description
# Data Engine
Data Engine is a general purpose python package for data engineering that leverages python, pandas, Apache Spark, and public cloud services.
## Usage Guide
### Installing the Package
To use dataengine, you can install it directly from PyPI:
`bash pip install dataengine `
### Adding to Your Project’s Requirements
If you want to include dataengine as a dependency for your project, add it to your requirements.txt file:
`bash dataengine `
Then, install the updated requirements:
`bash pip install -r requirements.txt `
Once installed, you can start using dataengine in your Python scripts or applications.
### How to use the package
In dataengine the primary class that will drive everything is the Engine class. For each instance of Engine you can specify the different subclasses through a configuration file or files. The different subclasses include the Database, Dataset, and Query classes.
An instance of Database defines a single database you will interact with
An instance of Dataset defines where to load data from using Apache spark (it can be locally or in s3)
An instance of Query defines what datasets are input dependencies, what the SQL statments you would like to run are, where the output of the query will be saved, and if the result should be inserted into a database along with custom parameters for each
TODO: Add explicit example here.
## Development
In order to contribute to the project follow the following instructions.
### Setup Instructions
The following steps will help you clone the repository and setup your environment.
Clone the repository:
`bash git clone https://github.com/leealessandrini/dataengine.git cd dataengine `
Create and activate a virtual environment:
`bash python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate `
Install the required dependencies:
`bash pip install -r requirements.txt `
You’re now ready to start contributing the project!
### Contribution Guide
Create or check out a new branch:
`bash git checkout -b your-feature-branch ` Replace your-feature-branch with a descriptive name for your branch.
Make changes to the code and stage the changes:
`bash git add . `
Commit your changes with a meaningful message:
`bash git commit -m "Add a concise description of your changes" `
Push your branch to the remote repository:
`bash git push origin your-feature-branch `
Create a pull request:
Go to your repository on GitHub.
Navigate to the “Pull Requests” tab.
Click “New Pull Request” and select your branch to merge with the main branch.
Add a descriptive title and details about your changes, then submit the pull request.
Merge the pull request (once reviewed and approved):
If you have the required permissions, merge the pull request.
Otherwise, wait for a project maintainer to review and merge it.
Congratulations! 🎉 You’ve successfully contributed to the project!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dataengine-0.0.92.tar.gz.
File metadata
- Download URL: dataengine-0.0.92.tar.gz
- Upload date:
- Size: 173.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d28293665000f6a6a0b66c6d815f0be1c0472fcc93f4ecfff17c5f0c79fcce9
|
|
| MD5 |
8aa13c9a279dc157fcc1c9acee6bb8fb
|
|
| BLAKE2b-256 |
c640b7716307df69e973dd25639098e46c38d870433dc57f69a094128efd0802
|
File details
Details for the file dataengine-0.0.92-py3-none-any.whl.
File metadata
- Download URL: dataengine-0.0.92-py3-none-any.whl
- Upload date:
- Size: 180.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b52ea3d56509d8a69d34b641ee8bca7ee6eaa90b8e569925fcd7f6e6de0d1df8
|
|
| MD5 |
d03bca6b728ca3d2e8480f1f918196ea
|
|
| BLAKE2b-256 |
b0bd5481a1bc9a56a81fd9e68395f4b95ed787fd75c81d5a059883d6b8e3b53e
|