Skip to main content

A simple class to aid in defining flexible schemas for PyArrow datasets.

Project description

Flexible Schemas

PyPI - Version python codecov tests code-quality license PRs contributors

This package provides a simple metaclass mixin to enable specifying validatable PyArrow schemas with optional columns or while allowing column re-ordering or extra columns.

Installation

pip install flexible_schema

Usage

>>> from flexible_schema import PyArrowSchema
>>> import pyarrow as pa
>>> import datetime
>>> class Data(PyArrowSchema):
...     subject_id: int
...     time: datetime.datetime
...     code: str
...     numeric_value: float | None = None
...     text_value: str | None = None
>>> Data.subject_id_name
'subject_id'
>>> Data.subject_id_dtype
DataType(int64)
>>> Data.time_name
'time'
>>> Data.time_dtype
TimestampType(timestamp[us])
>>> data_tbl = pa.Table.from_pydict({
...     "time": [
...         datetime.datetime(2021, 3, 1),
...         datetime.datetime(2021, 4, 1),
...         datetime.datetime(2021, 5, 1),
...     ],
...     "subject_id": [1, 2, 3],
...     "code": ["A", "B", "C"],
... })
>>> Data.validate(data_tbl)
pyarrow.Table
subject_id: int64
time: timestamp[us]
code: string
numeric_value: float
text_value: string
----
subject_id: [[1,2,3]]
time: [[2021-03-01 00:00:00.000000,2021-04-01 00:00:00.000000,2021-05-01 00:00:00.000000]]
code: [["A","B","C"]]
numeric_value: [[null,null,null]]
text_value: [[null,null,null]]
>>> data_tbl_with_extra = pa.Table.from_pydict({
...     "time": [
...         datetime.datetime(2021, 3, 1),
...         datetime.datetime(2021, 4, 1),
...     ],
...     "subject_id": [4, 5],
...     "extra_1": ["extra1", "extra2"],
...     "extra_2": [452, 11],
...     "code": ["D", "E"],
... })
>>> Data.validate(data_tbl_with_extra)
pyarrow.Table
subject_id: int64
time: timestamp[us]
code: string
numeric_value: float
text_value: string
extra_1: string
extra_2: int64
----
subject_id: [[4,5]]
time: [[2021-03-01 00:00:00.000000,2021-04-01 00:00:00.000000]]
code: [["D","E"]]
numeric_value: [[null,null]]
text_value: [[null,null]]
extra_1: [["extra1","extra2"]]
extra_2: [[452,11]]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flexible_schema-0.0.3.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flexible_schema-0.0.3-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file flexible_schema-0.0.3.tar.gz.

File metadata

  • Download URL: flexible_schema-0.0.3.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for flexible_schema-0.0.3.tar.gz
Algorithm Hash digest
SHA256 431b4b5411a7e9024b3bc3649318f6da6da5942d0929fffc065e8f62f835f510
MD5 8cc7306d7637e7ce6fb4ea261711998d
BLAKE2b-256 16b833303912e76295b4a2ecd114026fcde808bbc5d497c91e440b27b1b31765

See more details on using hashes here.

Provenance

The following attestation bundles were made for flexible_schema-0.0.3.tar.gz:

Publisher: python-build.yaml on mmcdermott/flexible_schema

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file flexible_schema-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for flexible_schema-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 aa2b312eda19ca3cd6ec08c1ba085fb65cb5475bc43d7196d1a033f7ca689083
MD5 05c8e77e7c859c7a248834251e08946e
BLAKE2b-256 644dfa2944e0b958ebd36f0469343474174741ff7a1296167585a819463df5ad

See more details on using hashes here.

Provenance

The following attestation bundles were made for flexible_schema-0.0.3-py3-none-any.whl:

Publisher: python-build.yaml on mmcdermott/flexible_schema

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page