A CLI and library to convert Singer catalogs to data warehouse schemas
Project description
Singer to Schema
A Python library to convert Singer catalog JSON to BigQuery table schema format.
Installation
pip install singer-to-schema
Or run directly with uvx.
uvx singer-to-schema --help
Usage
The SingerToSchema class takes a Singer catalog JSON string and converts it to BigQuery table schema format.
Command Line Interface
The package provides a command-line interface for easy conversion:
# Convert catalog.json to BigQuery schema and print to stdout
singer-to-schema catalog.json
# Convert and save to output file
singer-to-schema catalog.json -o bigquery_schema.json
# Read from stdin and output to file
cat catalog.json | singer-to-schema - -o schema.json
# Pretty print the output
singer-to-schema catalog.json --pretty
# Show help
singer-to-schema --help
Library Usage
from singer_to_schema import SingerToSchema
# Example Singer catalog JSON
catalog_json = '''{
"streams": [
{
"tap_stream_id": "users",
"stream": "users",
"schema": {
"type": ["null", "object"],
"additionalProperties": false,
"properties": {
"id": {
"type": ["null", "string"]
},
"name": {
"type": ["null", "string"]
},
"date_modified": {
"type": ["null", "string"],
"format": "date-time"
}
}
}
}
]
}'''
# Create converter instance
converter = SingerToSchema(catalog_json)
# Convert to BigQuery schema format
bigquery_schema = converter.to_bigquery()
print(bigquery_schema)
# Or get as JSON string
json_schema = converter.to_bigquery_json()
print(json_schema)
Output
The to_bigquery() method returns a dictionary with the following structure:
{
"users": {
"fields": [
{
"name": "id",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "name",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "date_modified",
"type": "TIMESTAMP",
"mode": "NULLABLE"
}
]
}
}
Type Mapping
The library maps Singer types to BigQuery types as follows:
| Singer Type | BigQuery Type |
|---|---|
string |
STRING |
integer |
INT64 |
number |
FLOAT64 |
boolean |
BOOL |
object |
JSON |
array |
JSON |
Date/Time Formats
When a string field has a format property, it's mapped to appropriate BigQuery types:
| Format | BigQuery Type |
|---|---|
date-time |
TIMESTAMP |
date |
DATE |
time |
TIME |
API Reference
SingerToSchema
__init__(catalog_json: str)
Initialize the converter with a Singer catalog JSON string.
Parameters:
catalog_json: A JSON string containing Singer catalog data
Raises:
ValueError: If the catalog structure is invalidjson.JSONDecodeError: If the JSON is malformed
to_bigquery() -> Dict[str, Any]
Convert the Singer catalog to BigQuery table schema format.
Returns:
- Dictionary containing BigQuery schema for each stream
to_bigquery_json() -> str
Convert the Singer catalog to BigQuery table schema format as a JSON string.
Returns:
- JSON string containing BigQuery schema
Development
Running Tests
uv run pytest
License
This project is licensed under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file singer_to_schema-0.1.1.tar.gz.
File metadata
- Download URL: singer_to_schema-0.1.1.tar.gz
- Upload date:
- Size: 4.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82485061a0f5809ba105de1b17b72c6081aa8f470ae54c123529b31ea489a1fa
|
|
| MD5 |
58705abf94a59d8e6f48173cbc63d94d
|
|
| BLAKE2b-256 |
3202dbb8ede8a8de9b790f7f164a26f330607d94156226b882ac8ff394d572fb
|
File details
Details for the file singer_to_schema-0.1.1-py3-none-any.whl.
File metadata
- Download URL: singer_to_schema-0.1.1-py3-none-any.whl
- Upload date:
- Size: 6.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
525ab2f4ae97a4c6483553cf3eb21df0f8bbf17c7a8fe42df07d5c895bdcf401
|
|
| MD5 |
a7062416d9945a6ac50acdfae4ebdd5f
|
|
| BLAKE2b-256 |
e4a5a140a618275447ffe2128b01d8519ec830f0068b49455e54fe595093e4f1
|