Skip to main content

Modern data transfer for cloud data lakes - High-performance pipelines via object storage

Project description

LakePipe

Modern data transfer for cloud data lakes

LakePipe is a high-performance data pipeline framework for moving data between data lakes and warehouses via object storage. Think of it as Sqoop for the cloud era - optimized for modern cloud architectures with vendor-specific bulk loaders.

License Python

Why LakePipe?

  • Cloud-native: Uses object storage (S3/GCS/Azure/OBS) as intermediate layer
  • Fast: Leverages vendor-optimized bulk loaders (TPT, Snowpipe, BigQuery Storage API)
  • Observable: Real-time progress, validation, and actionable error messages
  • Flexible: YAML configs, Python SDK, or CLI - your choice
  • Extensible: Plugin architecture for sources, targets, and transformations

Quick Start

Installation

pip install lakepipe

Simple Transfer

# lakepipe.yml
version: 1.0
name: my_pipeline

source:
  type: hive
  database: my_db
  table: my_table
  partition_by: date

storage:
  type: s3
  bucket: my-bucket
  path: /staging

target:
  type: teradata
  host: td-host
  database: target_db
  table: target_table
  loader: tpt

validation:
  row_count:
    enabled: true
    max_variance: 0.01
lakepipe run lakepipe.yml --params date=2025-01-15

Documentation

Supported Connectors

Sources

  • Hive (beeline)
  • PostgreSQL (planned)
  • MySQL (planned)
  • MongoDB (planned)

Storage

  • S3 (AWS)
  • GCS (Google Cloud)
  • Azure Blob Storage
  • OBS (Huawei Cloud)

Targets

  • Teradata (TPT)
  • Snowflake (planned)
  • BigQuery (planned)
  • Redshift (planned)

Contributing

Contributions are welcome! Please read CONTRIBUTING.md for details.

License

Apache License 2.0 - See LICENSE for details.

Acknowledgments

Inspired by Apache Sqoop, built for the cloud era.


Author: Md. Rakibul Hasan Status: Alpha - Active Development

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lakepipe-0.1.0.tar.gz (16.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lakepipe-0.1.0-py3-none-any.whl (24.1 kB view details)

Uploaded Python 3

File details

Details for the file lakepipe-0.1.0.tar.gz.

File metadata

  • Download URL: lakepipe-0.1.0.tar.gz
  • Upload date:
  • Size: 16.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for lakepipe-0.1.0.tar.gz
Algorithm Hash digest
SHA256 029bbedae166114fffc930d522bcf26cb639e605a0c2556a262cc4b2658b46aa
MD5 45d51057ac449861c0534352e3664c03
BLAKE2b-256 e976c31b9abaa9b6f9f7e051ed6435edccd80cae25a461419afa4d1559a4c411

See more details on using hashes here.

File details

Details for the file lakepipe-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: lakepipe-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 24.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for lakepipe-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8f03e314f99a902295c78c89eff5081c2c7f4a2b25264e1a151e5372e73c2dad
MD5 965e2eac81c589be2df22bee71ce79ce
BLAKE2b-256 f078d84ac8a6881e771c925afe78d7cf6c085d6d06ee7afd583752b182aff1e6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page