Skip to main content

A simple class to aid in defining flexible schemas for PyArrow datasets.

Project description

Flexible Schemas

PyPI - Version python codecov tests code-quality license PRs contributors

This package provides a simple metaclass mixin to enable specifying validatable PyArrow schemas with optional columns or while allowing column re-ordering or extra columns.

Installation

pip install flexible_schema

Usage

>>> from flexible_schema import PyArrowSchema
>>> import pyarrow as pa
>>> import datetime
>>> class Data(PyArrowSchema):
...     subject_id: int
...     time: datetime.datetime
...     code: str
...     numeric_value: float | None = None
...     text_value: str | None = None
>>> Data.subject_id_name
'subject_id'
>>> Data.subject_id_dtype
DataType(int64)
>>> Data.time_name
'time'
>>> Data.time_dtype
TimestampType(timestamp[us])
>>> data_tbl = pa.Table.from_pydict({
...     "time": [
...         datetime.datetime(2021, 3, 1),
...         datetime.datetime(2021, 4, 1),
...         datetime.datetime(2021, 5, 1),
...     ],
...     "subject_id": [1, 2, 3],
...     "code": ["A", "B", "C"],
... })
>>> Data.validate(data_tbl)
pyarrow.Table
subject_id: int64
time: timestamp[us]
code: string
numeric_value: float
text_value: string
----
subject_id: [[1,2,3]]
time: [[2021-03-01 00:00:00.000000,2021-04-01 00:00:00.000000,2021-05-01 00:00:00.000000]]
code: [["A","B","C"]]
numeric_value: [[null,null,null]]
text_value: [[null,null,null]]
>>> data_tbl_with_extra = pa.Table.from_pydict({
...     "time": [
...         datetime.datetime(2021, 3, 1),
...         datetime.datetime(2021, 4, 1),
...     ],
...     "subject_id": [4, 5],
...     "extra_1": ["extra1", "extra2"],
...     "extra_2": [452, 11],
...     "code": ["D", "E"],
... })
>>> Data.validate(data_tbl_with_extra)
pyarrow.Table
subject_id: int64
time: timestamp[us]
code: string
numeric_value: float
text_value: string
extra_1: string
extra_2: int64
----
subject_id: [[4,5]]
time: [[2021-03-01 00:00:00.000000,2021-04-01 00:00:00.000000]]
code: [["D","E"]]
numeric_value: [[null,null]]
text_value: [[null,null]]
extra_1: [["extra1","extra2"]]
extra_2: [[452,11]]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flexible_schema-0.0.2.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flexible_schema-0.0.2-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file flexible_schema-0.0.2.tar.gz.

File metadata

  • Download URL: flexible_schema-0.0.2.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for flexible_schema-0.0.2.tar.gz
Algorithm Hash digest
SHA256 04e56a7298ae42e64a7f9b91975d11332978d1fd1bbef7036784bf58fe3db73f
MD5 258b7c42bcec9c1d565dc5ed81fb51ab
BLAKE2b-256 92e3bda6c978dbd19c85317d5d9f7bdc338da38adceba1de9b1bef9e7c7b600a

See more details on using hashes here.

Provenance

The following attestation bundles were made for flexible_schema-0.0.2.tar.gz:

Publisher: python-build.yaml on mmcdermott/flexible_schema

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file flexible_schema-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for flexible_schema-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a1f807347392003d4a1c4b6ada36b834c1b21582bf2f5f3125fbfdce126d0352
MD5 5855ecac6dfd49a598e33b84cab0edc3
BLAKE2b-256 f73e32806a708ace8aa7063f4d57ae5f3f16715008edda40af0d0ec86daf7ba3

See more details on using hashes here.

Provenance

The following attestation bundles were made for flexible_schema-0.0.2-py3-none-any.whl:

Publisher: python-build.yaml on mmcdermott/flexible_schema

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page