Skip to main content

Schema Quarry

Project description

Schema Quarry

PyPI Status Python Version License

Documentation Tests Coverage Quality Gate Status

pre-commit Black Ruff Poetry

Features

  • Convert OpenAPI definitions to Parquet schemas
  • Use Schema Quarry as a CLI or as a Python library

Requirements

  • Python 3.12+

Installation

You can install Schema Quarry via pip from PyPI:

pip install schema-quarry

For local development with uv:

uv sync --dev

Usage

CLI:

schema-quarry generate \
  --source "https://petstore3.swagger.io/api/v3/openapi.json" \
  --root-schema Pet \
  --output-file pet.parquet \
  --print-schema

Skattemelding example:

schema-quarry generate \
  --source "https://app.swaggerhub.com/apiproxy/registry/skatteetaten/skattemelding-api/4.2.0" \
  --root-schema Skattemelding \
  --output-file skattemelding.parquet \
  --print-schema

Write directly to Google Cloud Storage:

schema-quarry generate \
  --source "https://app.swaggerhub.com/apiproxy/registry/skatteetaten/skattemelding-api/4.2.0" \
  --root-schema Skattemelding \
  --output-file "gs://my-bucket/schemas/skattemelding.parquet"

Library:

from schema_quarry import build_parquet_schema

result = build_parquet_schema(
    source="https://petstore3.swagger.io/api/v3/openapi.json",
    root_schema="Pet",
    output_file="pet.parquet",
)

print(result.schema)
print(result.parquet_path)

output_file also accepts gs://... URIs and writes them with gcsfs.

Please see the Reference Guide for details.

Tests

Run the full test suite locally with:

uv run pytest

The tests are split into two groups:

  • tests/unit/ checks the building blocks of the project, such as the CLI, the Python library API, and specific OpenAPI-to-Parquet mapping rules
  • tests/golden/ checks that real example inputs still produce exactly the same output as the checked-in reference files

The checked-in test data is stored in tests/resources/:

  • tests/resources/master/openapi/ contains OpenAPI documents used as regression inputs
  • tests/resources/master/parquet/ contains the expected Parquet schemas for those inputs
  • tests/resources/snapshots/schema-text/ contains expected text output for format_schema(...)

In other words:

  • if you change the converter logic, the golden tests will tell you whether the generated Parquet schema changed for any of the reference APIs
  • if you change schema formatting, the snapshot tests will tell you whether the human-readable schema text changed

You can also run the local quality checks with:

uv run ruff check tests
uv run mypy tests

Contributing

Contributions are very welcome. To learn more, see the Contributor Guide.

License

Distributed under the terms of the MIT license, Schema Quarry is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Credits

This project was generated from Statistics Norway's SSB PyPI Template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

schema_quarry-0.0.1.tar.gz (9.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

schema_quarry-0.0.1-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file schema_quarry-0.0.1.tar.gz.

File metadata

  • Download URL: schema_quarry-0.0.1.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for schema_quarry-0.0.1.tar.gz
Algorithm Hash digest
SHA256 353552c521a623fe4bbff6c53aaec33ebbf0aaebcba091e2f8b4258ba5148d3d
MD5 107f8ec872cbff4dad6165cb5a0e7f35
BLAKE2b-256 b3e429c88c22eea79e9870ea11291e1b78d0f324f8bebe24d5a187e5ddde4560

See more details on using hashes here.

Provenance

The following attestation bundles were made for schema_quarry-0.0.1.tar.gz:

Publisher: release.yml on statisticsnorway/schema-quarry

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file schema_quarry-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: schema_quarry-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for schema_quarry-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 506c0c55eb95239060a172f164f86633fc60a463a615764122f1f2b25f095617
MD5 1603bac6c83b7b89763c40503834632e
BLAKE2b-256 9817a7c8409252a07f072367e04ee8018de7a7c5ef5b02ac6d4126dbb8fc36c5

See more details on using hashes here.

Provenance

The following attestation bundles were made for schema_quarry-0.0.1-py3-none-any.whl:

Publisher: release.yml on statisticsnorway/schema-quarry

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page