Skip to main content

Recap reads and writes schemas from web services, databases, and schema registries in a standard format

Project description

recap

What is Recap?

Recap reads and writes schemas from web services, databases, and schema registries in a standard format.

⭐️ If you like this project, please give it a star! It helps the project get more visibility.

Table of Contents

Supported Formats

Format Read Write
Avro
BigQuery
Confluent Schema Registry
Hive Metastore
JSON Schema
MySQL
PostgreSQL
Protobuf
Snowflake
SQLite

Install

Install Recap and all of its optional dependencies:

pip install 'recap-core[all]'

You can also select specific dependencies:

pip install 'recap-core[avro,kafka]'

See pyproject.toml for a list of optional dependencies.

Usage

CLI

Recap comes with a command line interface that can list and read schemas from external systems.

List the children of a URL:

recap ls postgresql://user:pass@host:port/testdb
[
  "pg_toast",
  "pg_catalog",
  "public",
  "information_schema"
]

Keep drilling down:

recap ls postgresql://user:pass@host:port/testdb/public
[
  "test_types"
]

Read the schema for the test_types table as a Recap struct:

recap schema postgresql://user:pass@host:port/testdb/public/test_types
{
  "type": "struct",
  "fields": [
    {
      "type": "int64",
      "name": "test_bigint",
      "optional": true
    }
  ]
}

Gateway

Recap comes with a stateless HTTP/JSON gateway that can list and read schemas from data catalogs and databases.

Start the server at http://localhost:8000:

recap serve

List the schemas in a PostgreSQL database:

curl http://localhost:8000/gateway/ls/postgresql://user:pass@host:port/testdb
["pg_toast","pg_catalog","public","information_schema"]

And read a schema:

curl http://localhost:8000/gateway/schema/postgresql://user:pass@host:port/testdb/public/test_types
{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]}

The gateway fetches schemas from external systems in realtime and returns them as Recap schemas.

An OpenAPI schema is available at http://localhost:8000/docs.

Registry

You can store schemas in Recap's schema registry.

Start the server at http://localhost:8000:

recap serve

Put a schema in the registry:

curl -X POST \
    -H "Content-Type: application/x-recap+json" \
    -d '{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]}' \
    http://localhost:8000/registry/some_schema

Get the schema (and version) from the registry:

curl http://localhost:8000/registry/some_schema
[{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]},1]

Put a new version of the schema in the registry:

curl -X POST \
    -H "Content-Type: application/x-recap+json" \
    -d '{"type":"struct","fields":[{"type":"int32","name":"test_int","optional":true}]}' \
    http://localhost:8000/registry/some_schema

List schema versions:

curl http://localhost:8000/registry/some_schema/versions
[1,2]

Get a specific version of the schema:

curl http://localhost:8000/registry/some_schema/versions/1
[{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]},1]

The registry uses fsspec to store schemas in a variety of filesystems like S3, GCS, ABS, and the local filesystem. See the registry docs for more details.

An OpenAPI schema is available at http://localhost:8000/docs.

API

Recap has recap.converters and recap.clients packages.

  • Converters convert schemas to and from Recap schemas.
  • Clients read schemas from external systems (databases, schema registries, and so on) and use converters to return Recap schemas.

Read a schema from PostgreSQL:

from recap.clients import create_client

with create_client("postgresql://user:pass@host:port/testdb") as c:
    c.schema("testdb", "public", "test_types")

Convert the schema to Avro, Protobuf, and JSON schemas:

from recap.converters.avro import AvroConverter
from recap.converters.protobuf import ProtobufConverter
from recap.converters.json_schema import JSONSchemaConverter

avro_schema = AvroConverter().from_recap(struct)
protobuf_schema = ProtobufConverter().from_recap(struct)
json_schema = JSONSchemaConverter().from_recap(struct)

Transpile schemas from one format to another:

from recap.converters.json_schema import JSONSchemaConverter
from recap.converters.avro import AvroConverter

json_schema = """
{
    "type": "object",
    "$id": "https://recap.build/person.schema.json",
    "properties": {
        "name": {"type": "string"}
    }
}
"""

# Use Recap as an intermediate format to convert JSON schema to Avro
struct = JSONSchemaConverter().to_recap(json_schema)
avro_schema = AvroConverter().from_recap(struct)

Store schemas in Recap's schema registry:

from recap.storage.registry import RegistryStorage
from recap.types import StructType, IntType

storage = RegistryStorage("file:///tmp/recap-registry-storage")
version = storage.put(
    "postgresql://localhost:5432/testdb/public/test_table",
    StructType(fields=[IntType(32)])
)
storage.get("postgresql://localhost:5432/testdb/public/test_table")

# Get all versions of a schema
versions = storage.versions("postgresql://localhost:5432/testdb/public/test_table")

# List all schemas in the registry
schemas = storage.ls()

Docker

Recap's gateway and registry are also available as a Docker image:

docker run \
    -p 8000:8000 \
    -e RECAP_URLS=["postgresql://user:pass@localhost:5432/testdb"]' \
    ghcr.io/recap-build/recap:latest

See Recap's Docker documentation for more details.

Schema

See Recap's type spec for details on Recap's type system.

Documentation

Recap's documentation is available at recap.build.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recap_core-0.14.0.tar.gz (80.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

recap_core-0.14.0-py3-none-any.whl (51.2 kB view details)

Uploaded Python 3

File details

Details for the file recap_core-0.14.0.tar.gz.

File metadata

  • Download URL: recap_core-0.14.0.tar.gz
  • Upload date:
  • Size: 80.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: pdm/2.26.3 CPython/3.12.3 Linux/6.11.0-1018-azure

File hashes

Hashes for recap_core-0.14.0.tar.gz
Algorithm Hash digest
SHA256 8939078145ed9ad8f87316c59832d5b266195ede0e832289589979af36dea471
MD5 1a11f2b137fc06b12b11dc0932c06bd3
BLAKE2b-256 49b24a83b610d1e81b34894a906a74a429dbbf93d857051e38d8471584b0accd

See more details on using hashes here.

File details

Details for the file recap_core-0.14.0-py3-none-any.whl.

File metadata

  • Download URL: recap_core-0.14.0-py3-none-any.whl
  • Upload date:
  • Size: 51.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: pdm/2.26.3 CPython/3.12.3 Linux/6.11.0-1018-azure

File hashes

Hashes for recap_core-0.14.0-py3-none-any.whl
Algorithm Hash digest
SHA256 41f5c80e3ba68b4a276de6ada811c3e4c8fb72957de9cebf30a3d9cfb600c22a
MD5 d843292f97ca743d00b10ae2051885bf
BLAKE2b-256 715ebe232211be7f7db8da0ba0ce4fc213ef510ebd41d9278e3a5b20d3fa26c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page