Recap reads and writes schemas from web services, databases, and schema registries in a standard format
Project description
What is Recap?
Recap reads and writes schemas from web services, databases, and schema registries in a standard format.
⭐️ If you like this project, please give it a star! It helps the project get more visibility.
Table of Contents
Supported Formats
Format | Read | Write |
---|---|---|
Avro | ✅ | ✅ |
Protobuf | ✅ | ✅ |
JSON Schema | ✅ | ✅ |
Snowflake | ✅ | |
PostgreSQL | ✅ | |
MySQL | ✅ | |
BigQuery | ✅ | |
Confluent Schema Registry | ✅ | |
Hive Metastore | ✅ |
Install
Install Recap and all of its optional dependencies:
pip install 'recap-core[all]'
You can also select specific dependencies:
pip install 'recap-core[avro,kafka]'
See pyproject.toml
for a list of optional dependencies.
Usage
CLI
Recap comes with a command line interface that can list and read schemas from external systems.
List the children of a URL:
recap ls postgresql://user:pass@host:port/testdb
[
"pg_toast",
"pg_catalog",
"public",
"information_schema"
]
Keep drilling down:
recap ls postgresql://user:pass@host:port/testdb/public
[
"test_types"
]
Read the schema for the test_types
table as a Recap struct:
recap schema postgresql://user:pass@host:port/testdb/public/test_types
{
"type": "struct",
"fields": [
{
"type": "int64",
"name": "test_bigint",
"optional": true
}
]
}
Gateway
Recap comes with a stateless HTTP/JSON gateway that can list and read schemas.
Start the server at http://localhost:8000:
recap serve
List the schemas in a PostgreSQL database:
curl http://localhost:8000/gateway/ls/postgresql://user:pass@host:port/testdb
["pg_toast","pg_catalog","public","information_schema"]
And read a schema:
curl http://localhost:8000/gateway/schema/postgresql://user:pass@host:port/testdb/public/test_types
{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]}
The gateway fetches schemas from external systems in realtime and returns them as Recap schemas.
An OpenAPI schema is available at http://localhost:8000/docs.
Registry
You can store schemas in Recap's schema registry.
Start the server at http://localhost:8000:
recap serve
Put a schema in the registry:
curl -X POST \
-H "Content-Type: application/x-recap+json" \
-d '{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]}' \
http://localhost:8000/registry/some_schema
Get the schema (and version) from the registry:
curl http://localhost:8000/registry/some_schema
[{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]},1]
Put a new version of the schema in the registry:
curl -X POST \
-H "Content-Type: application/x-recap+json" \
-d '{"type":"struct","fields":[{"type":"int32","name":"test_int","optional":true}]}' \
http://localhost:8000/registry/some_schema
List schema versions:
curl http://localhost:8000/registry/some_schema/versions
[1,2]
Get a specific version of the schema:
curl http://localhost:8000/registry/some_schema/versions/1
[{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]},1]
The registry uses fsspec to store schemas in a variety of filesystems like S3, GCS, ABS, and the local filesystem. See the registry docs for more details.
An OpenAPI schema is available at http://localhost:8000/docs.
API
Recap has recap.converters
and recap.clients
packages.
- Converters convert schemas to and from Recap schemas.
- Clients read schemas from external systems (databases, schema registries, and so on) and use converters to return Recap schemas.
Read a schema from PostgreSQL:
from recap.clients import create_client
with create_client("postgresql://user:pass@host:port/testdb") as c:
c.schema("testdb", "public", "test_types")
Convert the schema to Avro, Protobuf, and JSON schemas:
from recap.converters.avro import AvroConverter
from recap.converters.protobuf import ProtobufConverter
from recap.converters.json_schema import JSONSchemaConverter
avro_schema = AvroConverter().from_recap(struct)
protobuf_schema = ProtobufConverter().from_recap(struct)
json_schema = JSONSchemaConverter().from_recap(struct)
Transpile schemas from one format to another:
from recap.converters.json_schema import JSONSchemaConverter
from recap.converters.avro import AvroConverter
json_schema = """
{
"type": "object",
"$id": "https://recap.build/person.schema.json",
"properties": {
"name": {"type": "string"}
}
}
"""
# Use Recap as an intermediate format to convert JSON schema to Avro
struct = JSONSchemaConverter().to_recap(json_schema)
avro_schema = AvroConverter().from_recap(struct)
Store schemas in Recap's schema registry:
from recap.storage.registry import RegistryStorage
from recap.types import StructType, IntType
storage = RegistryStorage("file:///tmp/recap-registry-storage")
version = storage.put(
"postgresql://localhost:5432/testdb/public/test_table",
StructType(fields=[IntType(32)])
)
storage.get("postgresql://localhost:5432/testdb/public/test_table")
# Get all versions of a schema
versions = storage.versions("postgresql://localhost:5432/testdb/public/test_table")
# List all schemas in the registry
schemas = storage.ls()
Docker
Recap's gateway and registry are also available as a Docker image:
docker run \
-p 8000:8000 \
-e RECAP_URLS=["postgresql://user:pass@localhost:5432/testdb"]' \
ghcr.io/recap-build/recap:latest
See Recap's Docker documentation for more details.
Schema
See Recap's type spec for details on Recap's type system.
Documentation
Recap's documentation is available at recap.build.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for recap_core-0.9.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e7e92eb3263ef79d535d490c9b7677e64800045111e54bbc5d032320b8623e4e |
|
MD5 | abd28591900e9d0f456bc76aeba72f7c |
|
BLAKE2b-256 | 2d5a5800d4d61f6974b04cc0627aa87cf1f543f04cce0af972a69c825adf6a29 |