Core ETL pipeline framework for mkpipe.
Project description
MkPipe
MkPipe is a modular, open-source ETL (Extract, Transform, Load) tool that allows you to integrate various data sources and sinks easily. It is designed to be extensible with a plugin-based architecture that supports extractors, transformers, and loaders.
Features
- Extract data from multiple sources (e.g., PostgreSQL, MongoDB).
- Transform data using custom Python logic and Apache Spark.
- Load data into various sinks (e.g., ClickHouse, PostgreSQL, Parquet).
- Plugin-based architecture that supports future extensions.
- Cloud-native architecture, can be deployed on Kubernetes and other environments.
Quick Setup
You can deploy MkPipe using one of the following strategies:
1. Using Docker Compose
This method sets up all required services automatically using Docker Compose.
Steps:
-
Clone or copy the
deployfolder from the repository. -
Modify the configuration files:
.envfor environment variables.mkpipe_project.yamlfor your specific ETL configurations.
-
Run the following command to start the services:
docker-compose up --build
This will set up the following services:
- PostgreSQL: Required for data storage.
- RabbitMQ: Required for the Celery
run_coordinator=celery. - Celery Worker: Required for running the Celery
run_coordinator=celery. - Flower UI: Optional, but required for monitoring Celery tasks.
Note: If you only want to use the
run_coordinator=singlewithout Celery, only PostgreSQL is necessary.
2. Running Locally
You can also set up the environment manually and run MkPipe locally.
Steps:
- Set up and configure the following services:
- RabbitMQ: Required for the Celery
run_coordinator. - PostgreSQL: Required for data storage.
- Flower UI: Optional, but required for monitoring Celery tasks.
- RabbitMQ: Required for the Celery
- Update the following configuration files in the
deployfolder:.envfor environment variables.mkpipe_project.yamlfor your ETL configurations.
- Install the python packages
pip install mkpipe mkpipe-extractor-postgres mkpipe-loader-postgres
- Set the project directory environment variable:
export MKPIPE_PROJECT_DIR={YOUR_PROJECT_PATH}
- Start MkPipe using the following command:
mkpipe run
Documentation
For more detailed documentation, please visit the GitHub repository.
License
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
Db Support Plan
For actively supported databases/plugins, please visit the MkPipe-hub repository!
Core Relational Databases
- PostgreSQL
- MySQL
- MariaDB
- SQL Server
- Oracle Database
- SQLite
- Snowflake
- Google BigQuery
- Amazon Redshift
- ClickHouse
- Amazon S3
NoSQL Databases
- MongoDB
- Cassandra
- DynamoDB
- Redis
- Azure Data Lake Storage (ADLS)
- Google Cloud Storage
- Elasticsearch
- TimescaleDB
- HDFS
- InfluxDB
ERP/CRM Systems
- Salesforce
- SAP
- Microsoft Dynamics
- NetSuite
- Workday
- HubSpot
- Zoho CRM
- Freshsales
- Zendesk
- Oracle NetSuite
Emerging Databases & Analytical Tools
Apache Druid
- Vertica
- SingleStore (MemSQL)
- Exasol
- SAP HANA
- IBM Db2
- Neo4j (Graph Database)
- Greenplum
- CockroachDB
- AWS Athena
Streaming Systems
- Kafka
- RabbitMQ
- Pulsar
- Apache Flink
- Amazon Kinesis
- Google Pub/Sub
- Azure Event Hubs
- Apache NiFi
- ActiveMQ
- Redpanda
File Formats & Data Lakes
- Parquet
- Avro
- JSON
- CSV
- XML
- ORC
- Google Drive (for raw files)
- Dropbox
- Box
- FTP/SFTP Servers
Specialized Analytics Tools
- Metabase (Data Visualization)
- Tableau Data Extracts
- Power BI
- Looker
- Google Analytics (GA4)
- Mixpanel
- Amplitude
- Adobe Analytics
- Heap
- Klipfolio
Industry-Specific Databases
- Aerospike
- RocksDB
- FaunaDB
- ScyllaDB
- ArangoDB
- MarkLogic
- CrateDB
- TigerGraph
- HarperDB
- SAP ASE (Sybase)
Legacy Databases
- Teradata
- Netezza
- Informix
- Ingres
- Firebird
- Progress OpenEdge
- ParAccel
- MaxDB
- HP Vertica
- Sybase IQ
Emerging Cloud & Hybrid Databases
- PlanetScale (MySQL-based)
- YugabyteDB
- TiDB
- OceanBase
- Citus (PostgreSQL-based)
- Snowplow Analytics
- Spanner (Google Cloud)
- MariaDB ColumnStore
- CockroachDB Serverless
- Weaviate (Vector Search)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mkpipe-0.1.54.tar.gz.
File metadata
- Download URL: mkpipe-0.1.54.tar.gz
- Upload date:
- Size: 22.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a2141bcd71588beb576b2a35b3534de15b6219d225926a839e1e277918cf14f
|
|
| MD5 |
2cf13f115cf019ab5fc4f3d8ac33e074
|
|
| BLAKE2b-256 |
4d0246f86c31ed113d9c413e812748488a585d8012cb8d80067479c5ab5db766
|
File details
Details for the file mkpipe-0.1.54-py3-none-any.whl.
File metadata
- Download URL: mkpipe-0.1.54-py3-none-any.whl
- Upload date:
- Size: 28.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7c6f9e438c69bc16120e46c429dcdce4a863adf7e4c0817e93e539c6b21d1f5
|
|
| MD5 |
ad9bf68dfbacd43bbf68d4d622c426b5
|
|
| BLAKE2b-256 |
b45e51a30d6a3b6ac21f8985db19fc15bd6aac42c1e614972747ff7f34d3b300
|