Skip to main content

A CLI and library to convert Singer catalogs to data warehouse schemas

Project description

Singer to Schema

A Python library to convert Singer catalog JSON to BigQuery table schema format.

Installation

pip install singer-to-schema

Or run directly with uvx.

uvx singer-to-schema --help

Usage

The SingerToSchema class takes a Singer catalog JSON string and converts it to BigQuery table schema format.

Command Line Interface

The package provides a command-line interface for easy conversion:

# Convert catalog.json to BigQuery schema and print to stdout
singer-to-schema catalog.json

# Convert and save to output file
singer-to-schema catalog.json -o bigquery_schema.json

# Read from stdin and output to file
cat catalog.json | singer-to-schema - -o schema.json

# Pretty print the output
singer-to-schema catalog.json --pretty

# Convert object/array fields to STRING instead of JSON
singer-to-schema catalog.json --no-json-fields

# Show help
singer-to-schema --help

Library Usage

from singer_to_schema import SingerToSchema

# Example Singer catalog JSON
catalog_json = '''{
  "streams": [
    {
      "tap_stream_id": "users",
      "stream": "users",
      "schema": {
        "type": ["null", "object"],
        "additionalProperties": false,
        "properties": {
          "id": {
            "type": ["null", "string"]
          },
          "name": {
            "type": ["null", "string"]
          },
          "date_modified": {
            "type": ["null", "string"],
            "format": "date-time"
          }
        }
      }
    }
  ]
}'''

# Create converter instance (default: use JSON fields)
converter = SingerToSchema(catalog_json)

# Or disable JSON fields to use STRING instead
converter_no_json = SingerToSchema(catalog_json, use_json_fields=False)

# Convert to BigQuery schema format
bigquery_schema = converter.to_bigquery()
print(bigquery_schema)

# Or get as JSON string
json_schema = converter.to_bigquery_json()
print(json_schema)

Output

The to_bigquery() method returns a dictionary with the following structure:

{
  "users": {
    "fields": [
      {
        "name": "id",
        "type": "STRING",
        "mode": "NULLABLE"
      },
      {
        "name": "name",
        "type": "STRING",
        "mode": "NULLABLE"
      },
      {
        "name": "date_modified",
        "type": "TIMESTAMP",
        "mode": "NULLABLE"
      }
    ]
  }
}

Type Mapping

The library maps Singer types to BigQuery types as follows:

Singer Type BigQuery Type
string STRING
integer INT64
number FLOAT64
boolean BOOL
object JSON
array REPEATED (with item type)

Date/Time Formats

When a string field has a format property, it's mapped to appropriate BigQuery types:

Format BigQuery Type
date-time TIMESTAMP
date DATE
time TIME

Array Fields

Array fields are converted to BigQuery REPEATED mode with the appropriate item type:

{
  "tags": {
    "type": "array",
    "items": {
      "type": "string"
    }
  }
}

Becomes:

{
  "name": "tags",
  "type": "STRING",
  "mode": "REPEATED"
}

API Reference

SingerToSchema

__init__(catalog_json: str, use_json_fields: bool = True)

Initialize the converter with a Singer catalog JSON string.

Parameters:

  • catalog_json: A JSON string containing Singer catalog data
  • use_json_fields: If True, object and array fields use JSON type. If False, they use STRING type.

Raises:

  • ValueError: If the catalog structure is invalid
  • json.JSONDecodeError: If the JSON is malformed

to_bigquery() -> Dict[str, Any]

Convert the Singer catalog to BigQuery table schema format.

Returns:

  • Dictionary containing BigQuery schema for each stream

to_bigquery_json() -> str

Convert the Singer catalog to BigQuery table schema format as a JSON string.

Returns:

  • JSON string containing BigQuery schema

Development

Running Tests

uv run pytest

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

singer_to_schema-0.2.0.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

singer_to_schema-0.2.0-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file singer_to_schema-0.2.0.tar.gz.

File metadata

  • Download URL: singer_to_schema-0.2.0.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.4

File hashes

Hashes for singer_to_schema-0.2.0.tar.gz
Algorithm Hash digest
SHA256 db0cd8f8a3f35558fc25e761066f9f175f0f6f746197815f73fd1f98e575cfb2
MD5 fbc3afe6579c667729a1dc424557819e
BLAKE2b-256 1fcedc49536272aeee7f1b665fe0b61fa72386cee45df73a098ead61d725d650

See more details on using hashes here.

File details

Details for the file singer_to_schema-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for singer_to_schema-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2981284a03faf62ddb80089bfba05555b55dbaa689cc7452791b48bd38311291
MD5 646747cf4c9af6f36533980a365b4a67
BLAKE2b-256 8201231b2b71b1c36d3815fb61af0a62eb2eb258b07dea6059a675903b00aed6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page