Skip to main content

A library for reading and writing partitioned data

Project description

Partitioneer

Partitioneer is a Python library that provides utilities for managing data files in a date-partitioned format. It offers functions for writing data to partitions, reading data from partitions with filtering capabilities, and retrieving partition date information.

Installation

You can install Partitioneer using pip:

pip install partitioneer

Usage

Writing Data to Partitions

To write data to partitioned Parquet files:

from partitioneer import write_data_to_partitions
import pandas as pd

df = pd.DataFrame(...)  # Your data
write_data_to_partitions(
    df,
    base_path="/path/to/data",
    date_col="date_column",
    override_existing=False
)

Reading Data from Partitions

To read data from partitioned Parquet files:

from partitioneer import read_data_from_partitions, PartitionFilter

df = read_data_from_partitions(
    base_path="/path/to/data",
    filters=[
        PartitionFilter("category", "in", ["A", "B"]),
        PartitionFilter("value", "greater_than", 100)
    ],
    add_partition_date=True,
    start_date="2024-01-01",
    end_date="2024-12-31"
)

Getting Partition Date Information

To get the latest or first partition date:

from partitioneer import get_latest_partition_date, get_first_partition_date

latest_date = get_latest_partition_date("/path/to/data")
first_date = get_first_partition_date("/path/to/data")

Build Instructions

To build the package:

python setup.py sdist bdist_wheel

To upload to PyPI:

pip install twine
twine upload dist/*

Automated build and publish script:

python setup.py sdist bdist_wheel
pip install twine
twine upload dist/* --password <add_pypi_token_here>
rm -r ./build
rm -r ./dist
rm -r ./partitioneer.egg-info

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

partitioneer-0.2.13.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

partitioneer-0.2.13-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file partitioneer-0.2.13.tar.gz.

File metadata

  • Download URL: partitioneer-0.2.13.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.0

File hashes

Hashes for partitioneer-0.2.13.tar.gz
Algorithm Hash digest
SHA256 6cd23fcc6a958bb6ffd2a2b20cf199111f2049c16dc437a69ee58527180d44be
MD5 4c9b06d9d342fc45c6b7f7ed64d9615b
BLAKE2b-256 8ba201cd33f9e34583949fd0d25ffe54eb37877d1f7577550eb91a97e8bdff4e

See more details on using hashes here.

File details

Details for the file partitioneer-0.2.13-py3-none-any.whl.

File metadata

  • Download URL: partitioneer-0.2.13-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.0

File hashes

Hashes for partitioneer-0.2.13-py3-none-any.whl
Algorithm Hash digest
SHA256 a1838208e573d7d26326a1bf7e862db3e6c854ac2fca8165e855116041c9edc2
MD5 ad43edb4325c0f73a7f927af0226d9c3
BLAKE2b-256 fd329341b31c398cd1e82a66c065c2bb55d2e995dd3eaea54c8e549049bc14a6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page