Schema Quarry
Project description
Schema Quarry
Features
- Convert OpenAPI definitions to Parquet schemas
- Use Schema Quarry as a CLI or as a Python library
Requirements
- Python 3.12+
Installation
You can install Schema Quarry via pip from PyPI:
pip install schema-quarry
For local development with uv:
uv sync --dev
Usage
CLI:
schema-quarry generate \
--source "https://petstore3.swagger.io/api/v3/openapi.json" \
--root-schema Pet \
--output-file pet.parquet \
--print-schema
Skattemelding example:
schema-quarry generate \
--source "https://app.swaggerhub.com/apiproxy/registry/skatteetaten/skattemelding-api/4.2.0" \
--root-schema Skattemelding \
--output-file skattemelding.parquet \
--print-schema
Write directly to Google Cloud Storage:
schema-quarry generate \
--source "https://app.swaggerhub.com/apiproxy/registry/skatteetaten/skattemelding-api/4.2.0" \
--root-schema Skattemelding \
--output-file "gs://my-bucket/schemas/skattemelding.parquet"
Library:
from schema_quarry import build_parquet_schema
result = build_parquet_schema(
source="https://petstore3.swagger.io/api/v3/openapi.json",
root_schema="Pet",
output_file="pet.parquet",
)
print(result.schema)
print(result.parquet_path)
output_file also accepts gs://... URIs and writes them with gcsfs.
Please see the Reference Guide for details.
Tests
Run the full test suite locally with:
uv run pytest
The tests are split into two groups:
tests/unit/checks the building blocks of the project, such as the CLI, the Python library API, and specific OpenAPI-to-Parquet mapping rulestests/golden/checks that real example inputs still produce exactly the same output as the checked-in reference files
The checked-in test data is stored in tests/resources/:
tests/resources/master/openapi/contains OpenAPI documents used as regression inputstests/resources/master/parquet/contains the expected Parquet schemas for those inputstests/resources/snapshots/schema-text/contains expected text output forformat_schema(...)
In other words:
- if you change the converter logic, the golden tests will tell you whether the generated Parquet schema changed for any of the reference APIs
- if you change schema formatting, the snapshot tests will tell you whether the human-readable schema text changed
You can also run the local quality checks with:
uv run ruff check tests
uv run mypy tests
Contributing
Contributions are very welcome. To learn more, see the Contributor Guide.
License
Distributed under the terms of the MIT license, Schema Quarry is free and open source software.
Issues
If you encounter any problems, please file an issue along with a detailed description.
Credits
This project was generated from Statistics Norway's SSB PyPI Template.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file schema_quarry-0.0.1.tar.gz.
File metadata
- Download URL: schema_quarry-0.0.1.tar.gz
- Upload date:
- Size: 9.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
353552c521a623fe4bbff6c53aaec33ebbf0aaebcba091e2f8b4258ba5148d3d
|
|
| MD5 |
107f8ec872cbff4dad6165cb5a0e7f35
|
|
| BLAKE2b-256 |
b3e429c88c22eea79e9870ea11291e1b78d0f324f8bebe24d5a187e5ddde4560
|
Provenance
The following attestation bundles were made for schema_quarry-0.0.1.tar.gz:
Publisher:
release.yml on statisticsnorway/schema-quarry
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
schema_quarry-0.0.1.tar.gz -
Subject digest:
353552c521a623fe4bbff6c53aaec33ebbf0aaebcba091e2f8b4258ba5148d3d - Sigstore transparency entry: 1185489202
- Sigstore integration time:
-
Permalink:
statisticsnorway/schema-quarry@b17419a2c3677626da4fda61edcead69eaa67bf7 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/statisticsnorway
-
Access:
internal
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b17419a2c3677626da4fda61edcead69eaa67bf7 -
Trigger Event:
push
-
Statement type:
File details
Details for the file schema_quarry-0.0.1-py3-none-any.whl.
File metadata
- Download URL: schema_quarry-0.0.1-py3-none-any.whl
- Upload date:
- Size: 13.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
506c0c55eb95239060a172f164f86633fc60a463a615764122f1f2b25f095617
|
|
| MD5 |
1603bac6c83b7b89763c40503834632e
|
|
| BLAKE2b-256 |
9817a7c8409252a07f072367e04ee8018de7a7c5ef5b02ac6d4126dbb8fc36c5
|
Provenance
The following attestation bundles were made for schema_quarry-0.0.1-py3-none-any.whl:
Publisher:
release.yml on statisticsnorway/schema-quarry
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
schema_quarry-0.0.1-py3-none-any.whl -
Subject digest:
506c0c55eb95239060a172f164f86633fc60a463a615764122f1f2b25f095617 - Sigstore transparency entry: 1185489222
- Sigstore integration time:
-
Permalink:
statisticsnorway/schema-quarry@b17419a2c3677626da4fda61edcead69eaa67bf7 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/statisticsnorway
-
Access:
internal
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b17419a2c3677626da4fda61edcead69eaa67bf7 -
Trigger Event:
push
-
Statement type: