Skip to main content

A CLI and library to convert Singer catalogs to data warehouse schemas

Project description

Singer to Schema

A Python library to convert Singer catalog JSON to BigQuery table schema format.

Installation

pip install singer-to-schema

Or run directly with uvx.

uvx singer-to-schema --help

Usage

The SingerToSchema class takes a Singer catalog JSON string and converts it to BigQuery table schema format.

Command Line Interface

The package provides a command-line interface for easy conversion:

# Convert catalog.json to BigQuery schema and print to stdout
singer-to-schema catalog.json

# Convert and save to output file
singer-to-schema catalog.json -o bigquery_schema.json

# Read from stdin and output to file
cat catalog.json | singer-to-schema - -o schema.json

# Pretty print the output
singer-to-schema catalog.json --pretty

# Convert object/array fields to STRING instead of JSON
singer-to-schema catalog.json --no-json-fields

# Show help
singer-to-schema --help

Library Usage

from singer_to_schema import SingerToSchema

# Example Singer catalog JSON
catalog_json = '''{
  "streams": [
    {
      "tap_stream_id": "users",
      "stream": "users",
      "schema": {
        "type": ["null", "object"],
        "additionalProperties": false,
        "properties": {
          "id": {
            "type": ["null", "string"]
          },
          "name": {
            "type": ["null", "string"]
          },
          "date_modified": {
            "type": ["null", "string"],
            "format": "date-time"
          }
        }
      }
    }
  ]
}'''

# Create converter instance (default: use JSON fields)
converter = SingerToSchema(catalog_json)

# Or disable JSON fields to use STRING instead
converter_no_json = SingerToSchema(catalog_json, use_json_fields=False)

# Convert to BigQuery schema format
bigquery_schema = converter.to_bigquery()
print(bigquery_schema)

# Or get as JSON string
json_schema = converter.to_bigquery_json()
print(json_schema)

Output

The to_bigquery() method returns a dictionary with the following structure:

{
  "users": {
    "fields": [
      {
        "name": "id",
        "type": "STRING",
        "mode": "NULLABLE"
      },
      {
        "name": "name",
        "type": "STRING",
        "mode": "NULLABLE"
      },
      {
        "name": "date_modified",
        "type": "TIMESTAMP",
        "mode": "NULLABLE"
      }
    ]
  }
}

Type Mapping

The library maps Singer types to BigQuery types as follows:

Singer Type BigQuery Type
string STRING
integer INT64
number FLOAT64
boolean BOOL
object JSON
array REPEATED (with item type)

Date/Time Formats

When a string field has a format property, it's mapped to appropriate BigQuery types:

Format BigQuery Type
date-time TIMESTAMP
date DATE
time TIME

Array Fields

Array fields are converted to BigQuery REPEATED mode with the appropriate item type:

{
  "tags": {
    "type": "array",
    "items": {
      "type": "string"
    }
  }
}

Becomes:

{
  "name": "tags",
  "type": "STRING",
  "mode": "REPEATED"
}

API Reference

SingerToSchema

__init__(catalog_json: str, use_json_fields: bool = True)

Initialize the converter with a Singer catalog JSON string.

Parameters:

  • catalog_json: A JSON string containing Singer catalog data
  • use_json_fields: If True, object and array fields use JSON type. If False, they use STRING type.

Raises:

  • ValueError: If the catalog structure is invalid
  • json.JSONDecodeError: If the JSON is malformed

to_bigquery() -> Dict[str, Any]

Convert the Singer catalog to BigQuery table schema format.

Returns:

  • Dictionary containing BigQuery schema for each stream

to_bigquery_json() -> str

Convert the Singer catalog to BigQuery table schema format as a JSON string.

Returns:

  • JSON string containing BigQuery schema

Development

Running Tests

uv run pytest

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

singer_to_schema-0.2.1.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

singer_to_schema-0.2.1-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file singer_to_schema-0.2.1.tar.gz.

File metadata

  • Download URL: singer_to_schema-0.2.1.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.4

File hashes

Hashes for singer_to_schema-0.2.1.tar.gz
Algorithm Hash digest
SHA256 651f70ba0f5fe0b4cb0817459ce57290bace79b1ad8f6e0fdaa0f2f30cf0659f
MD5 9fa59a257503b45082a3c0e1ec250bea
BLAKE2b-256 35da41f440bc7ffde586266ad63b8f2abfded4f46324c337b1da2f6d10d60682

See more details on using hashes here.

File details

Details for the file singer_to_schema-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for singer_to_schema-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 59920944c14fa9c364c877dd68b5311c6cc77278009191c5ce2d1d62942ce094
MD5 b12dd65b5a3d91416a0d534b11bd2364
BLAKE2b-256 ec54b8ae07246cf3e911f9905afa7a17636794cae02e219967dec3a9341625b2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page