Skip to main content

Query parquet files using pyarrow or S3 Select by first gathering file metadata into a database

Project description

Lakeshack

A small rustic shack on the shores of a big lake A small rustic shack on the shores of a big lake

A simplified data lakehouse, more of a data lakeshack, optimized for retrieving filtered records from Parquet files. Similar to the various lakehouse solutions (Iceberg, Hudi, Delta Lake), Lakeshack gathers up the min/max values for specified columns from each Parquet file and stores them into a database (Metastore). When you want to query for a set of records, it first check the Metastore to get the list of Parquet files that might have the desired records, and then only queries those Parquet files. The files may be stored locally or in S3. You may query using either native pyarrow or leverage S3 Select.

To achieve optimal performance, a partitioning strategy for how records are to be written to the Parquet files must align with the main query pattern must be implemented. See the documentation for more information on this.

Installation

Lakeshack may be install using pip:

pip install lakeshack

Documentation

Documentation can be found at https://mhendrey.github.io/lakeshack

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lakeshack-0.2.2.tar.gz (28.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lakeshack-0.2.2-py3-none-any.whl (32.6 kB view details)

Uploaded Python 3

File details

Details for the file lakeshack-0.2.2.tar.gz.

File metadata

  • Download URL: lakeshack-0.2.2.tar.gz
  • Upload date:
  • Size: 28.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for lakeshack-0.2.2.tar.gz
Algorithm Hash digest
SHA256 97b1d1062bd196d00809bf2b1d8427b2fdf31fd68faa6c80c02ac8ee2664187b
MD5 f3c196f63650efa0435fdd9536d663b9
BLAKE2b-256 d2ccd748a87f7f62f21b0d44e0495a524b7dbb8bb81cfba3edefa9eebc05fc37

See more details on using hashes here.

File details

Details for the file lakeshack-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: lakeshack-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 32.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for lakeshack-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ea50a7474136e5fbd324bcf21df3595bd0e413bc3dd2dde1e221cc5902e4a7f9
MD5 dc682fbe93a1e2ce8350f1608ab5fbc1
BLAKE2b-256 68d202a092fd4a951082dd33f687432ea1f6ae01fd83b2be9bb9266e90e6c728

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page