Skip to main content

A simple class to aid in defining flexible schemas for PyArrow datasets.

Project description

Flexible Schemas

PyPI - Version python codecov tests code-quality license PRs contributors

This package provides a simple metaclass mixin to enable specifying validatable PyArrow schemas with optional columns or while allowing column re-ordering or extra columns.

Installation

pip install flexible_schema

Usage

>>> from flexible_schema import PyArrowSchema
>>> import pyarrow as pa
>>> import datetime
>>> class Data(PyArrowSchema):
...     subject_id: int
...     time: datetime.datetime
...     code: str
...     numeric_value: float | None = None
...     text_value: str | None = None
>>> Data.subject_id_name
'subject_id'
>>> Data.subject_id_dtype
DataType(int64)
>>> Data.time_name
'time'
>>> Data.time_dtype
TimestampType(timestamp[us])
>>> data_tbl = pa.Table.from_pydict({
...     "time": [
...         datetime.datetime(2021, 3, 1),
...         datetime.datetime(2021, 4, 1),
...         datetime.datetime(2021, 5, 1),
...     ],
...     "subject_id": [1, 2, 3],
...     "code": ["A", "B", "C"],
... })
>>> Data.validate(data_tbl)
pyarrow.Table
subject_id: int64
time: timestamp[us]
code: string
numeric_value: float
text_value: string
----
subject_id: [[1,2,3]]
time: [[2021-03-01 00:00:00.000000,2021-04-01 00:00:00.000000,2021-05-01 00:00:00.000000]]
code: [["A","B","C"]]
numeric_value: [[null,null,null]]
text_value: [[null,null,null]]
>>> data_tbl_with_extra = pa.Table.from_pydict({
...     "time": [
...         datetime.datetime(2021, 3, 1),
...         datetime.datetime(2021, 4, 1),
...     ],
...     "subject_id": [4, 5],
...     "extra_1": ["extra1", "extra2"],
...     "extra_2": [452, 11],
...     "code": ["D", "E"],
... })
>>> Data.validate(data_tbl_with_extra)
pyarrow.Table
subject_id: int64
time: timestamp[us]
code: string
numeric_value: float
text_value: string
extra_1: string
extra_2: int64
----
subject_id: [[4,5]]
time: [[2021-03-01 00:00:00.000000,2021-04-01 00:00:00.000000]]
code: [["D","E"]]
numeric_value: [[null,null]]
text_value: [[null,null]]
extra_1: [["extra1","extra2"]]
extra_2: [[452,11]]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flexible_schema-0.0.1.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flexible_schema-0.0.1-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file flexible_schema-0.0.1.tar.gz.

File metadata

  • Download URL: flexible_schema-0.0.1.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for flexible_schema-0.0.1.tar.gz
Algorithm Hash digest
SHA256 f810217cb1491dd564787ef30ec94711d4f7d275fc6c76865d86d1301e2ac2f9
MD5 9d02d9f9b5e9845b9e98aec138ad5eb2
BLAKE2b-256 1c6981704935675d4c02da0127c85db1b832488a64b625767c376254329d42be

See more details on using hashes here.

Provenance

The following attestation bundles were made for flexible_schema-0.0.1.tar.gz:

Publisher: python-build.yaml on mmcdermott/flexible_schema

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file flexible_schema-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for flexible_schema-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 676f4b9254189ddffa069dde1f1cbf20c2dffa65c03bed3744099e0c14babcf6
MD5 919df006f0d6f58fc1f37fd2ac503525
BLAKE2b-256 13c6d2615109d2acc0d036ce79f67734d1ded1d2188b1648bf907a7dd15711a0

See more details on using hashes here.

Provenance

The following attestation bundles were made for flexible_schema-0.0.1-py3-none-any.whl:

Publisher: python-build.yaml on mmcdermott/flexible_schema

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page