Core ETL pipeline framework for mkpipe.
Project description
MkPipe
MkPipe is a modular, open-source ETL (Extract, Transform, Load) tool that allows you to integrate various data sources and sinks easily. It is designed to be extensible with a plugin-based architecture that supports extractors, transformers, and loaders.
Features
- Extract data from multiple sources (e.g., PostgreSQL, MongoDB).
- Transform data using custom Python logic and Apache Spark.
- Load data into various sinks (e.g., ClickHouse, PostgreSQL, Parquet).
- Plugin-based architecture that supports future extensions.
- Cloud-native architecture, can be deployed on Kubernetes and other environments.
Quick Setup
You can deploy MkPipe using one of the following strategies:
1. Using Docker Compose
This method sets up all required services automatically using Docker Compose.
Steps:
-
Clone or copy the
deployfolder from the repository. -
Modify the configuration files:
.envfor environment variables.mkpipe_project.yamlfor your specific ETL configurations.
-
Run the following command to start the services:
docker-compose up --build
This will set up the following services:
- PostgreSQL: Required for data storage.
- RabbitMQ: Required for the Celery
run_coordinator=celery. - Celery Worker: Required for running the Celery
run_coordinator=celery. - Flower UI: Optional, but required for monitoring Celery tasks.
Note: If you only want to use the
run_coordinator=singlewithout Celery, only PostgreSQL is necessary.
2. Running Locally
You can also set up the environment manually and run MkPipe locally.
Steps:
- Set up and configure the following services:
- RabbitMQ: Required for the Celery
run_coordinator. - PostgreSQL: Required for data storage.
- Flower UI: Optional, but required for monitoring Celery tasks.
- RabbitMQ: Required for the Celery
- Update the following configuration files in the
deployfolder:.envfor environment variables.mkpipe_project.yamlfor your ETL configurations.
- Install the python packages
pip install mkpipe mkpipe-extractor-postgres mkpipe-loader-postgres
- Set the project directory environment variable:
export MKPIPE_PROJECT_DIR={YOUR_PROJECT_PATH}
- Start MkPipe using the following command:
mkpipe run
Documentation
For more detailed documentation, please visit the GitHub repository.
License
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mkpipe-0.1.16.tar.gz.
File metadata
- Download URL: mkpipe-0.1.16.tar.gz
- Upload date:
- Size: 18.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a39773d2173b7b6886db4770474b7c435b56b5f2ab01e8fcd989280b6cc6300
|
|
| MD5 |
c10be831f4f4cd542e1958b72d9ec00a
|
|
| BLAKE2b-256 |
c8d9f6134dc96d044a98f66f77c22250a425783e7dbb11cc770d6d35a92e2de8
|
File details
Details for the file mkpipe-0.1.16-py3-none-any.whl.
File metadata
- Download URL: mkpipe-0.1.16-py3-none-any.whl
- Upload date:
- Size: 21.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1dff6b2313887b4b8e5ab909f7968ee44242f1ccd5dcab50a4dcdf9d4b27b7c3
|
|
| MD5 |
98283a0d4389f2fd42e6804c5678ce54
|
|
| BLAKE2b-256 |
a2adb7090025d898042073548a1d0afe5a2af98705d71762303c382e8e87514d
|