Skip to main content

A CLI and library to convert Singer catalogs to data warehouse schemas

Project description

Singer to Schema

A Python library to convert Singer catalog JSON to BigQuery table schema format.

Installation

pip install singer-to-schema

Or run directly with uvx.

uvx singer-to-schema --help

Usage

The SingerToSchema class takes a Singer catalog JSON string and converts it to BigQuery table schema format.

Command Line Interface

The package provides a command-line interface for easy conversion:

# Convert catalog.json to BigQuery schema and print to stdout
singer-to-schema catalog.json

# Convert and save to output file
singer-to-schema catalog.json -o bigquery_schema.json

# Read from stdin and output to file
cat catalog.json | singer-to-schema - -o schema.json

# Pretty print the output
singer-to-schema catalog.json --pretty

# Show help
singer-to-schema --help

Library Usage

from singer_to_schema import SingerToSchema

# Example Singer catalog JSON
catalog_json = '''{
  "streams": [
    {
      "tap_stream_id": "users",
      "stream": "users",
      "schema": {
        "type": ["null", "object"],
        "additionalProperties": false,
        "properties": {
          "id": {
            "type": ["null", "string"]
          },
          "name": {
            "type": ["null", "string"]
          },
          "date_modified": {
            "type": ["null", "string"],
            "format": "date-time"
          }
        }
      }
    }
  ]
}'''

# Create converter instance
converter = SingerToSchema(catalog_json)

# Convert to BigQuery schema format
bigquery_schema = converter.to_bigquery()
print(bigquery_schema)

# Or get as JSON string
json_schema = converter.to_bigquery_json()
print(json_schema)

Output

The to_bigquery() method returns a dictionary with the following structure:

{
  "users": {
    "fields": [
      {
        "name": "id",
        "type": "STRING",
        "mode": "NULLABLE"
      },
      {
        "name": "name",
        "type": "STRING",
        "mode": "NULLABLE"
      },
      {
        "name": "date_modified",
        "type": "TIMESTAMP",
        "mode": "NULLABLE"
      }
    ]
  }
}

Type Mapping

The library maps Singer types to BigQuery types as follows:

Singer Type BigQuery Type
string STRING
integer INT64
number FLOAT64
boolean BOOL
object JSON
array JSON

Date/Time Formats

When a string field has a format property, it's mapped to appropriate BigQuery types:

Format BigQuery Type
date-time TIMESTAMP
date DATE
time TIME

API Reference

SingerToSchema

__init__(catalog_json: str)

Initialize the converter with a Singer catalog JSON string.

Parameters:

  • catalog_json: A JSON string containing Singer catalog data

Raises:

  • ValueError: If the catalog structure is invalid
  • json.JSONDecodeError: If the JSON is malformed

to_bigquery() -> Dict[str, Any]

Convert the Singer catalog to BigQuery table schema format.

Returns:

  • Dictionary containing BigQuery schema for each stream

to_bigquery_json() -> str

Convert the Singer catalog to BigQuery table schema format as a JSON string.

Returns:

  • JSON string containing BigQuery schema

Development

Running Tests

uv run pytest

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

singer_to_schema-0.1.0.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

singer_to_schema-0.1.0-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file singer_to_schema-0.1.0.tar.gz.

File metadata

  • Download URL: singer_to_schema-0.1.0.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.4

File hashes

Hashes for singer_to_schema-0.1.0.tar.gz
Algorithm Hash digest
SHA256 33c34ed5700f0179a8cfd5e67e17ff48bd9fba31e112550085f6f7c1f852e91e
MD5 507f491ba21cc9704c254d5ab61ee8c5
BLAKE2b-256 e7c5ba3bfc7d120c0a71210b1d1b25d881dcc4b819da4945d06af29c3efaa9d1

See more details on using hashes here.

File details

Details for the file singer_to_schema-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for singer_to_schema-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3c3c5204ceef4cfd4e883db20abb5bdc6125d74d066c8699128a8ab2c2e8b97e
MD5 6112a254d9a23effcdffee56b3fe075d
BLAKE2b-256 c9bb0810177a192062fccf4ac26fd70f43f0d716ba99c9406d253da9cf8d3800

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page