Skip to main content

Query parquet files using pyarrow or S3 Select by first gathering file metadata into a database

Project description

Lakeshack

A small rustic shack on the shores of a big lake A small rustic shack on the shores of a big lake

A simplified data lakehouse, more of a data lakeshack, optimized for retrieving filtered records from Parquet files. Similar to the various lakehouse solutions (Iceberg, Hudi, Delta Lake), Lakeshack gathers up the min/max values for specified columns from each Parquet file and stores them into a database (Metastore). When you want to query for a set of records, it first checks the Metastore to get the list of Parquet files that might have the desired records, and then only queries those Parquet files. The files may be stored locally or in S3. You may query using either native pyarrow or leverage S3 Select.

To acheive optimal performance, a partitioning & clustering strategy (which specifies how the records are written to the Parquet files) should align with the main query pattern expected to be used on the data. See the documentation for more information on this.

Installation

Lakeshack may be install using pip:

pip install lakeshack

Documentation

Documentation can be found at https://mhendrey.github.io/lakeshack

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lakeshack-0.2.3.tar.gz (29.1 kB view details)

Uploaded Source

Built Distribution

lakeshack-0.2.3-py3-none-any.whl (32.9 kB view details)

Uploaded Python 3

File details

Details for the file lakeshack-0.2.3.tar.gz.

File metadata

  • Download URL: lakeshack-0.2.3.tar.gz
  • Upload date:
  • Size: 29.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for lakeshack-0.2.3.tar.gz
Algorithm Hash digest
SHA256 2adfc4838e5e691534e8a73e072c4c84f83d72f30a282a70a931463dc6abb0ef
MD5 cfd48ba7039db591de2c12655f40b106
BLAKE2b-256 75352b9b341df5282f8d821a60f3db1ea0ed8fe64cef83e2af3c1816dead13cf

See more details on using hashes here.

File details

Details for the file lakeshack-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: lakeshack-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 32.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for lakeshack-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a373af67761f41f6a18f549913b8939267d294e9ff9ba8a62af58d453a6ecd76
MD5 9901ed7503c7a6f8539553bd7dab9f17
BLAKE2b-256 bf7d657b45890d8a57cacf244f458cdc37e1407851298fcf6929591baf3c9d2c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page