Skip to main content

A CLI and library to convert Singer catalogs to data warehouse schemas

Project description

Singer to Schema

A Python library to convert Singer catalog JSON to BigQuery table schema format.

Installation

pip install singer-to-schema

Or run directly with uvx.

uvx singer-to-schema --help

Usage

The SingerToSchema class takes a Singer catalog JSON string and converts it to BigQuery table schema format.

Command Line Interface

The package provides a command-line interface for easy conversion:

# Convert catalog.json to BigQuery schema and print to stdout
singer-to-schema catalog.json

# Convert and save to output file
singer-to-schema catalog.json -o bigquery_schema.json

# Read from stdin and output to file
cat catalog.json | singer-to-schema - -o schema.json

# Pretty print the output
singer-to-schema catalog.json --pretty

# Show help
singer-to-schema --help

Library Usage

from singer_to_schema import SingerToSchema

# Example Singer catalog JSON
catalog_json = '''{
  "streams": [
    {
      "tap_stream_id": "users",
      "stream": "users",
      "schema": {
        "type": ["null", "object"],
        "additionalProperties": false,
        "properties": {
          "id": {
            "type": ["null", "string"]
          },
          "name": {
            "type": ["null", "string"]
          },
          "date_modified": {
            "type": ["null", "string"],
            "format": "date-time"
          }
        }
      }
    }
  ]
}'''

# Create converter instance
converter = SingerToSchema(catalog_json)

# Convert to BigQuery schema format
bigquery_schema = converter.to_bigquery()
print(bigquery_schema)

# Or get as JSON string
json_schema = converter.to_bigquery_json()
print(json_schema)

Output

The to_bigquery() method returns a dictionary with the following structure:

{
  "users": {
    "fields": [
      {
        "name": "id",
        "type": "STRING",
        "mode": "NULLABLE"
      },
      {
        "name": "name",
        "type": "STRING",
        "mode": "NULLABLE"
      },
      {
        "name": "date_modified",
        "type": "TIMESTAMP",
        "mode": "NULLABLE"
      }
    ]
  }
}

Type Mapping

The library maps Singer types to BigQuery types as follows:

Singer Type BigQuery Type
string STRING
integer INT64
number FLOAT64
boolean BOOL
object JSON
array JSON

Date/Time Formats

When a string field has a format property, it's mapped to appropriate BigQuery types:

Format BigQuery Type
date-time TIMESTAMP
date DATE
time TIME

API Reference

SingerToSchema

__init__(catalog_json: str)

Initialize the converter with a Singer catalog JSON string.

Parameters:

  • catalog_json: A JSON string containing Singer catalog data

Raises:

  • ValueError: If the catalog structure is invalid
  • json.JSONDecodeError: If the JSON is malformed

to_bigquery() -> Dict[str, Any]

Convert the Singer catalog to BigQuery table schema format.

Returns:

  • Dictionary containing BigQuery schema for each stream

to_bigquery_json() -> str

Convert the Singer catalog to BigQuery table schema format as a JSON string.

Returns:

  • JSON string containing BigQuery schema

Development

Running Tests

uv run pytest

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

singer_to_schema-0.1.1.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

singer_to_schema-0.1.1-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file singer_to_schema-0.1.1.tar.gz.

File metadata

  • Download URL: singer_to_schema-0.1.1.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.4

File hashes

Hashes for singer_to_schema-0.1.1.tar.gz
Algorithm Hash digest
SHA256 82485061a0f5809ba105de1b17b72c6081aa8f470ae54c123529b31ea489a1fa
MD5 58705abf94a59d8e6f48173cbc63d94d
BLAKE2b-256 3202dbb8ede8a8de9b790f7f164a26f330607d94156226b882ac8ff394d572fb

See more details on using hashes here.

File details

Details for the file singer_to_schema-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for singer_to_schema-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 525ab2f4ae97a4c6483553cf3eb21df0f8bbf17c7a8fe42df07d5c895bdcf401
MD5 a7062416d9945a6ac50acdfae4ebdd5f
BLAKE2b-256 e4a5a140a618275447ffe2128b01d8519ec830f0068b49455e54fe595093e4f1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page