Automated connector wrapper for streaming data securely from a private MinIO Data Lake
Project description
Terrafox Data Lake Connector
A simple, secure wrapper module to stream files out of your private data lake into remote notebook runtimes seamlessly.
terrafox-datalake
A lightweight, universal, stream-native connector wrapper designed to stream datasets securely from private MinIO and S3-compatible Data Lakes straight into Pandas dataframes.
By replacing traditional file-system directory mapping wrappers (s3fs/fsspec) with direct object streaming via boto3, this package completely eliminates network edge bottlenecks, Cloudflare proxy payload limits, and 403 Forbidden credential collisions caused by background directory scanning.
Key Features
- Stream-Native Engine: Reads multi-gigabyte datasets (e.g., 1.3 GiB+ CSVs) linearly using high-performance byte-stream network chunks, keeping your local or Google Colab memory consumption minimal.
- Bypasses Proxy Blocks: Sidesteps standard reverse-proxy constraints (like Cloudflare Tunnel 100 MiB Client Max Body Size upload blocks) during active read cycles.
- Fully Universal & Repurposable: Zero hardcoded endpoints. Works natively out-of-the-box with your configured defaults or targets any custom local/cloud data lake clusters dynamically.
- Zero Configuration Conflict: Completely abstracts complex
botocoreconfiguration arguments, address styling structures, and signature parameters out of your notebooks.
Installation
Terrafox Data Lake
A lightweight Python package for securely connecting to and streaming data from private MinIO-based data lake environments.
Installation
pip install terrafox-datalake
Quick Start
1. Connecting Natively via Interactive Prompt
If no background credentials are found, calling connect() will securely prompt you for your data lake credentials.
import terrafox_datalake as dl
# Initialize the data lake client context securely
dl.connect()
2. Silent Credentials Injection (Automated Workflows)
For automated scripts, CI/CD pipelines, headless environments, or to bypass the interactive login prompt in Google Colab, set your credentials as environment variables before initializing the connection.
import os
import terrafox_datalake as dl
# Pre-populate session credentials
os.environ["MINIO_USER"] = "admin"
os.environ["MINIO_PASSWORD"] = "your_secure_password"
os.environ["MINIO_ENDPOINT"] = "https://minio.terrafoxai.com"
# Initialize the connection
dl.connect()
3. Advanced Usage: Connecting to Different Infrastructures
Terrafox Data Lake is designed to be dynamic and reusable. Switch seamlessly between production environments, staging clusters, or local development instances.
import terrafox_datalake as dl
# Connect to an alternate cluster or local MinIO instance
dl.connect(endpoint="https://local-testing-cluster.local:9000")
# Read data from a different environment
df = dl.read_csv(
bucket="test-bucket",
key="metrics.csv"
)
Example: Reading Data from a Data Lake
import terrafox_datalake as dl
dl.connect()
df = dl.read_csv(
bucket="bigdata",
key="vehicles.csv"
)
print(df.head())
Architecture Requirements
- Python: 3.7 or higher
- Supported Storage: MinIO (S3-compatible object storage)
Dependencies
- pandas
- boto3
- s3fs
- fsspec
Features
- Secure interactive authentication
- Environment variable support for automation
- Native MinIO integration
- S3-compatible object storage access
- Simple DataFrame-based data retrieval
- Flexible infrastructure switching between environments
License
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file terrafox_datalake-0.1.4.tar.gz.
File metadata
- Download URL: terrafox_datalake-0.1.4.tar.gz
- Upload date:
- Size: 3.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
09952fc83375bdafb726ceea43083f0ea0f172fa22f33917f721f7ce2867eec6
|
|
| MD5 |
2bf9e196c39d4695e8bdaa50e1479fd3
|
|
| BLAKE2b-256 |
434b6b1a0250d2e7215eea4ae2e45fc272bfbf8af171d65d0c51db953c9cbf7e
|
File details
Details for the file terrafox_datalake-0.1.4-py3-none-any.whl.
File metadata
- Download URL: terrafox_datalake-0.1.4-py3-none-any.whl
- Upload date:
- Size: 3.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b51b8ef7d2709c799d021fe13fe269ba803ecdb8f3d8d1a529315dd7fc39bcbb
|
|
| MD5 |
fe58ee4989cbe651a6020e88e7dbc292
|
|
| BLAKE2b-256 |
7a2a8226e6ada29942fc42dcef0ee38cc0d1acfc537fe760c772308318410e0c
|