Modern data transfer for cloud data lakes - High-performance pipelines via object storage
Project description
LakePipe
Modern data transfer for cloud data lakes
LakePipe is a high-performance data pipeline framework for moving data between data lakes and warehouses via object storage. Think of it as Sqoop for the cloud era - optimized for modern cloud architectures with vendor-specific bulk loaders.
Why LakePipe?
- Cloud-native: Uses object storage (S3/GCS/Azure/OBS) as intermediate layer
- Fast: Leverages vendor-optimized bulk loaders (TPT, Snowpipe, BigQuery Storage API)
- Observable: Real-time progress, validation, and actionable error messages
- Flexible: YAML configs, Python SDK, or CLI - your choice
- Extensible: Plugin architecture for sources, targets, and transformations
Quick Start
Installation
pip install lakepipe
Simple Transfer
# lakepipe.yml
version: 1.0
name: my_pipeline
source:
type: hive
database: my_db
table: my_table
partition_by: date
storage:
type: s3
bucket: my-bucket
path: /staging
target:
type: teradata
host: td-host
database: target_db
table: target_table
loader: tpt
validation:
row_count:
enabled: true
max_variance: 0.01
lakepipe run lakepipe.yml --params date=2025-01-15
Documentation
Supported Connectors
Sources
- Hive (beeline)
- PostgreSQL (planned)
- MySQL (planned)
- MongoDB (planned)
Storage
- S3 (AWS)
- GCS (Google Cloud)
- Azure Blob Storage
- OBS (Huawei Cloud)
Targets
- Teradata (TPT)
- Snowflake (planned)
- BigQuery (planned)
- Redshift (planned)
Contributing
Contributions are welcome! Please read CONTRIBUTING.md for details.
License
Apache License 2.0 - See LICENSE for details.
Acknowledgments
Inspired by Apache Sqoop, built for the cloud era.
Author: Md. Rakibul Hasan Status: Alpha - Active Development
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lakepipe-0.1.0.tar.gz.
File metadata
- Download URL: lakepipe-0.1.0.tar.gz
- Upload date:
- Size: 16.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
029bbedae166114fffc930d522bcf26cb639e605a0c2556a262cc4b2658b46aa
|
|
| MD5 |
45d51057ac449861c0534352e3664c03
|
|
| BLAKE2b-256 |
e976c31b9abaa9b6f9f7e051ed6435edccd80cae25a461419afa4d1559a4c411
|
File details
Details for the file lakepipe-0.1.0-py3-none-any.whl.
File metadata
- Download URL: lakepipe-0.1.0-py3-none-any.whl
- Upload date:
- Size: 24.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f03e314f99a902295c78c89eff5081c2c7f4a2b25264e1a151e5372e73c2dad
|
|
| MD5 |
965e2eac81c589be2df22bee71ce79ce
|
|
| BLAKE2b-256 |
f078d84ac8a6881e771c925afe78d7cf6c085d6d06ee7afd583752b182aff1e6
|