Skip to main content

Core ETL pipeline framework for mkpipe.

Project description

MkPipe

MkPipe is a modular, open-source ETL (Extract, Transform, Load) tool that allows you to integrate various data sources and sinks easily. It is designed to be extensible with a plugin-based architecture that supports extractors, transformers, and loaders.

Features

  • Extract data from multiple sources (e.g., PostgreSQL, MongoDB).
  • Transform data using custom Python logic and Apache Spark.
  • Load data into various sinks (e.g., ClickHouse, PostgreSQL, Parquet).
  • Plugin-based architecture that supports future extensions.
  • Cloud-native architecture, can be deployed on Kubernetes and other environments.

Quick Setup

You can deploy MkPipe using one of the following strategies:

1. Using Docker Compose

This method sets up all required services automatically using Docker Compose.

Steps:

  1. Clone or copy the deploy folder from the repository.

  2. Modify the configuration files:

  3. Run the following command to start the services:

    docker-compose up --build
    

    This will set up the following services:

    • PostgreSQL: Required for data storage.
    • RabbitMQ: Required for the Celery run_coordinator=celery.
    • Celery Worker: Required for running the Celery run_coordinator=celery.
    • Flower UI: Optional, but required for monitoring Celery tasks.

    Note: If you only want to use the run_coordinator=singlewithout Celery, only PostgreSQL is necessary.

2. Running Locally

You can also set up the environment manually and run MkPipe locally.

Steps:

  1. Set up and configure the following services:
    • RabbitMQ: Required for the Celery run_coordinator.
    • PostgreSQL: Required for data storage.
    • Flower UI: Optional, but required for monitoring Celery tasks.
  2. Update the following configuration files in the deploy folder:
    • .env for environment variables.
    • mkpipe_project.yaml for your ETL configurations.
  3. Install the python packages
    pip install mkpipe mkpipe-extractor-postgres mkpipe-loader-postgres
    
  4. Set the project directory environment variable:
    export MKPIPE_PROJECT_DIR={YOUR_PROJECT_PATH}
    
  5. Start MkPipe using the following command:
    mkpipe run
    

Documentation

For more detailed documentation, please visit the GitHub repository.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Db Support Plan

For actively supported databases/plugins, please visit the MkPipe-hub repository!

Core Relational Databases

  • PostgreSQL
  • MySQL
  • MariaDB
  • SQL Server
  • Oracle Database
  • SQLite
  • Snowflake
  • Google BigQuery
  • Amazon Redshift
  • ClickHouse
  • Amazon S3

NoSQL Databases

  • MongoDB
  • Cassandra
  • DynamoDB
  • Redis
  • Azure Data Lake Storage (ADLS)
  • Google Cloud Storage
  • Elasticsearch
  • TimescaleDB
  • HDFS
  • InfluxDB

ERP/CRM Systems

  • Salesforce
  • SAP
  • Microsoft Dynamics
  • NetSuite
  • Workday
  • HubSpot
  • Zoho CRM
  • Freshsales
  • Zendesk
  • Oracle NetSuite

Emerging Databases & Analytical Tools

Apache Druid

  • Vertica
  • SingleStore (MemSQL)
  • Exasol
  • SAP HANA
  • IBM Db2
  • Neo4j (Graph Database)
  • Greenplum
  • CockroachDB
  • AWS Athena

Streaming Systems

  • Kafka
  • RabbitMQ
  • Pulsar
  • Apache Flink
  • Amazon Kinesis
  • Google Pub/Sub
  • Azure Event Hubs
  • Apache NiFi
  • ActiveMQ
  • Redpanda

File Formats & Data Lakes

  • Parquet
  • Avro
  • JSON
  • CSV
  • XML
  • ORC
  • Google Drive (for raw files)
  • Dropbox
  • Box
  • FTP/SFTP Servers

Specialized Analytics Tools

  • Metabase (Data Visualization)
  • Tableau Data Extracts
  • Power BI
  • Looker
  • Google Analytics (GA4)
  • Mixpanel
  • Amplitude
  • Adobe Analytics
  • Heap
  • Klipfolio

Industry-Specific Databases

  • Aerospike
  • RocksDB
  • FaunaDB
  • ScyllaDB
  • ArangoDB
  • MarkLogic
  • CrateDB
  • TigerGraph
  • HarperDB
  • SAP ASE (Sybase)

Legacy Databases

  • Teradata
  • Netezza
  • Informix
  • Ingres
  • Firebird
  • Progress OpenEdge
  • ParAccel
  • MaxDB
  • HP Vertica
  • Sybase IQ

Emerging Cloud & Hybrid Databases

  • PlanetScale (MySQL-based)
  • YugabyteDB
  • TiDB
  • OceanBase
  • Citus (PostgreSQL-based)
  • Snowplow Analytics
  • Spanner (Google Cloud)
  • MariaDB ColumnStore
  • CockroachDB Serverless
  • Weaviate (Vector Search)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mkpipe-0.2.7.tar.gz (25.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mkpipe-0.2.7-py3-none-any.whl (32.5 kB view details)

Uploaded Python 3

File details

Details for the file mkpipe-0.2.7.tar.gz.

File metadata

  • Download URL: mkpipe-0.2.7.tar.gz
  • Upload date:
  • Size: 25.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for mkpipe-0.2.7.tar.gz
Algorithm Hash digest
SHA256 68f672c5bdc5f5f7331079e169bca8e94000b2540f4741515c8b1d53c28e51a6
MD5 1eb3f05d4d834e9274c41904103346df
BLAKE2b-256 fdbe58352795c1ec8adb750e189d2bb2afc96156830aa4b7282308a9f3243339

See more details on using hashes here.

File details

Details for the file mkpipe-0.2.7-py3-none-any.whl.

File metadata

  • Download URL: mkpipe-0.2.7-py3-none-any.whl
  • Upload date:
  • Size: 32.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for mkpipe-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 be6bb03c6668c4b2d4cb14b09cc63df336f91afa6ebebb19b629c2fa1069a3b8
MD5 e821bfb7a9ff395b79a03325a1f231cc
BLAKE2b-256 ceb9236edc5e7a679daee96fcc5b91b11859e61fa5e550c447b937dee384f404

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page