A Python library to manage (create, read, update, delete) large amounts of tabular data in a blob store.
Project description
plateau
flat files, flat land
plateau is a Python library to manage (create, read, update, delete) large
amounts of tabular data in a blob store. It stores data as datasets, which
it presents as pandas DataFrames to the user. Datasets are a collection of
files with the same schema that reside in a blob store. plateau uses a metadata
definition to handle these datasets efficiently. For distributed access and
manipulation of datasets plateau offers a Dask interface.
Storing data distributed over multiple files in a blob store (S3, ABS, GCS, etc.) allows for a fast, cost-efficient and highly scalable data infrastructure. A downside of storing data solely in an object store is that the storages themselves give little to no guarantees beyond the consistency of a single file. In particular, they cannot guarantee the consistency of your dataset. If we demand a consistent state of our dataset at all times, we need to track the state of the dataset. plateau frees us from having to do this manually.
The plateau.io module provides building blocks to create and modify these
datasets in data pipelines. plateau handles I/O, tracks dataset partitions
and selects subsets of data transparently.
Installation
This project is managed by pixi. You can install the package in development mode using:
git clone https://github.com/data-engineering-collective/plateau
cd plateau
pixi run pre-commit-install
pixi run postinstall
pixi run test
Plateau is also available on PyPI and
can be installed through pip:
pip install plateau
Contributing
Find details on how to contribute here.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file plateau-4.6.2.tar.gz.
File metadata
- Download URL: plateau-4.6.2.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da14c588a5000de6faa892ea91837ceef42e8efa1bfcfbce6c99097cb412782c
|
|
| MD5 |
4bfbeac5d09d6fb539af598e399aa620
|
|
| BLAKE2b-256 |
2fe7b21cfa3ecd0125163c56cc62c60851e9c5743583a5f28b55c06767937c99
|
Provenance
The following attestation bundles were made for plateau-4.6.2.tar.gz:
Publisher:
build.yml on data-engineering-collective/plateau
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
plateau-4.6.2.tar.gz -
Subject digest:
da14c588a5000de6faa892ea91837ceef42e8efa1bfcfbce6c99097cb412782c - Sigstore transparency entry: 545568564
- Sigstore integration time:
-
Permalink:
data-engineering-collective/plateau@cf877712441d3227cb539e82b406c528af00e8b9 -
Branch / Tag:
refs/tags/4.6.2 - Owner: https://github.com/data-engineering-collective
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build.yml@cf877712441d3227cb539e82b406c528af00e8b9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file plateau-4.6.2-py3-none-any.whl.
File metadata
- Download URL: plateau-4.6.2-py3-none-any.whl
- Upload date:
- Size: 138.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c8bc8f9a97568a2516f90e7e35dafd01d2489ffb1b7ea35fc050d0650249dfd
|
|
| MD5 |
b953845ff16f1c2b5f6816b16537d83c
|
|
| BLAKE2b-256 |
496424f665301f26f06db9cc10baf555b83752732ffeb192aeea12dabb28591b
|
Provenance
The following attestation bundles were made for plateau-4.6.2-py3-none-any.whl:
Publisher:
build.yml on data-engineering-collective/plateau
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
plateau-4.6.2-py3-none-any.whl -
Subject digest:
9c8bc8f9a97568a2516f90e7e35dafd01d2489ffb1b7ea35fc050d0650249dfd - Sigstore transparency entry: 545568578
- Sigstore integration time:
-
Permalink:
data-engineering-collective/plateau@cf877712441d3227cb539e82b406c528af00e8b9 -
Branch / Tag:
refs/tags/4.6.2 - Owner: https://github.com/data-engineering-collective
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build.yml@cf877712441d3227cb539e82b406c528af00e8b9 -
Trigger Event:
push
-
Statement type: