Skip to main content

A tool to synchronize BigQuery table schemas with local definitions

Project description

bq-schema-sync

bq-schema-sync is a Python package designed to help synchronize Google BigQuery table schemas with local schema definitions. It provides tools for comparing schemas, applying changes, validating schema definitions, generating migration scripts, managing schema versions, and enforcing schema validation rules.

Features

  • Schema Comparison: Identify differences between local schema definitions and BigQuery table schemas.
  • Schema Synchronization: Apply changes to BigQuery table schemas based on local definitions.
  • Validation: Ensure that local schema definitions adhere to BigQuery constraints and best practices.
  • Migration Script Generation: Generate SQL scripts to manually apply schema changes.
  • Schema Versioning: Track schema changes over time, save versions, list versions, and apply specific versions.
  • Dry Run Mode: Preview changes without applying them to the BigQuery table.
  • Schema Validation Rules: Enforce custom validation rules to ensure schema definitions meet specific criteria.

Installation

You can install bq-schema-sync using pip:

pip install bq-schema-sync

Usage

Command-Line Interface (CLI)

Initialize Configuration

Generate a template configuration file:

bq-schema-sync init

Compare Schemas

Compare the local schema with the BigQuery table schema:

bq-schema-sync compare --config config.yaml --dry-run

Apply Changes

Sync the local schema with the BigQuery table schema:

bq-schema-sync apply --config config.yaml --dry-run

Generate Migration Script

Generate a SQL script for manual schema migration:

bq-schema-sync generate-script --config config.yaml --output migration.sql

Validate Schema

Validate the local schema against BigQuery constraints and custom validation rules:

bq-schema-sync validate --config config.yaml

Save Schema Version

Save the current schema version with a description:

bq-schema-sync save-version --config config.yaml --description "Added new field 'email'"

List Schema Versions

List all saved schema versions:

bq-schema-sync list-versions --config config.yaml

Apply Schema Version

Apply a specific schema version:

bq-schema-sync apply-version --config config.yaml --version 1

Python API

You can also use bq-schema-sync as a Python module:

from bq_schema_sync import SchemaSync
from google.cloud import bigquery

# Initialize SchemaSync with configuration details
client = bigquery.Client(project='my-gcp-project')
schema_sync = SchemaSync(
    project_id='my-gcp-project',
    dataset_id='my_dataset',
    table_id='my_table',
    schema={'fields': [{'name': 'id', 'type': 'STRING', 'mode': 'REQUIRED', 'description': 'Unique identifier'}]},
    client=client
)

# Compare schemas
differences = schema_sync.compare_schemas()
print("Schema Differences:", differences)

# Apply changes to sync the BigQuery table schema with the local schema
schema_sync.apply_changes()

# Generate migration script
schema_sync.generate_migration_script('migration_scripts/update_my_table_schema.sql')

# Validate the local schema
schema_sync.validate_schema()

# Save the current schema version
schema_sync.save_version("Initial schema definition")

# List all schema versions
versions = schema_sync.list_versions()
for version in versions:
    print(f"Version: {version['version']}, Timestamp: {version['timestamp']}, Description: {version['description']}")

# Apply a specific schema version
schema_sync.apply_version(1)

Configuration File

The configuration file should be in YAML format and include the following details:

project_id: your-gcp-project-id
dataset_id: your-dataset-id
table_id: your-table-id
schema:
  fields:
    - name: id
      type: STRING
      mode: REQUIRED
      description: "Unique identifier"
    - name: created_at
      type: TIMESTAMP
      mode: NULLABLE
      description: "Record creation timestamp"

Example Schema

Here is an example schema file in YAML format (config.yaml):

project_id: your-gcp-project-id
dataset_id: your-dataset-id
table_id: your-table-id
schema:
  fields:
    - name: id
      type: STRING
      mode: REQUIRED
      description: "Unique identifier"
    - name: created_at
      type: TIMESTAMP
      mode: NULLABLE
      description: "Record creation timestamp"

Testing

You can run tests using unittest:

python -m unittest discover tests

Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue.

Author

This project is developed and maintained by Victor Hasim Elexpe Ahamri. You can follow me on Twitter @victorelexpe and visit my website elexpe.dev.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bq_schema_sync-0.2.1.tar.gz (10.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bq_schema_sync-0.2.1-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file bq_schema_sync-0.2.1.tar.gz.

File metadata

  • Download URL: bq_schema_sync-0.2.1.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for bq_schema_sync-0.2.1.tar.gz
Algorithm Hash digest
SHA256 1247df50d40b21299c89f31523052cdc23616e3ca1edd7a789486bb20460cdf4
MD5 0fedfef01c6f69b8ebd29a0c5d84e00a
BLAKE2b-256 bf78431ebb09b94f113a32f588a04398e47e6a50da7186cd3127012117613944

See more details on using hashes here.

File details

Details for the file bq_schema_sync-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: bq_schema_sync-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 10.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for bq_schema_sync-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 202d8f80c394f2edbec2a556bacaf6101165eff233aa9dffae6b603e9cbc9fa4
MD5 a2029403767ab133d92b57abed48b515
BLAKE2b-256 19906ea62130b6a36cff1ac78288aa8f868c5328f49080c248656af622210ed0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page