Skip to main content

Simple tool for writing Google BigQuery migrations

Project description

python-bigquery-migrations

The python-bigquery-migrations package provides a streamlined way to create and manage BigQuery databases using intuitive CLI commands, such as the following:

bigquery-migrations run

What are the benefits of using migrations?

Migrations are like version control for your database, allowing you to define and share the application's datasets and table schema definitions.

Getting Started

0. Prerequisite

  • Google Cloud Project with enabled billing
  • Enabled Google Cloud BigQuery API
  • Google Cloud Service Account JSON file

1. Install

pip install bigquery-migrations

2. Create the project folder structure

Create two subdirectory:

  1. credentials
  2. migrations
your-project-root-folder
├── credentials
├── migrations
└── ...

3. Create the neccessary files in the folders

Google Cloud Service Account JSON file

Put your Google Cloud Service Account JSON file in the credentials subdirectory. See more info in the Authorize BigQuery Client section

your-project
├── credentials
│   ├── gcp-sa.json
├── migrations
└── ...

You can use different folder name and file name but in that case you must specify them with command arguments, such as the following:

bigquery-migrations run --gcp-sa-json-dir  my-creds --gcp-sa-json-fname my-service-account.json
argument description
--gcp-sa-json-dir Name of the service account JSON file directory (optional)
--gcp-sa-json-fname Name of the service account JSON file (optional)

IMPORTANT!
Never check the Google Cloud Service Account JSON file into version control. This file contains sensitive credentials that could compromise your Google Cloud account if exposed.

To prevent accidental commits, make sure to add the file to your .gitignore configuration. For example:

# .gitignore
gcp-sa.json

By ignoring this file, you reduce the risk of unintentional leaks and maintain secure practices in your repository.

Migrations

Create your own migrations and put them in the migrations directory. See the Migration structure section and Migration naming conventions section for more info.

your-project
├── credentials
│   ├── gcp-sa.json
├── migrations
│   ├── 2024_12_01_120000_create_users_table.py
└── ...

You can use different folder name but in that case you must specify it with a command argument:

bigquery-migrations run --migrations-dir bq-migrations
argument description
--migrations-dir Name of the migrations directory (optional)

Running migrations

IMPORTANT!
You have to create your own Migrations first! Jump to Creating Migrations section

To run all of your outstanding migrations, execute the run command:

bigquery-migrations run

You can specify the Google Cloud Project id with the --gcp-project-id argument:

bigquery-migrations run --gcp-project-id your-gcp-id

Migration log

IMPORTANT!
It's cruical to keep the migration_log.json file in place, and not to modify it manualy!

After the first successful run a migration_log.json is created in the migrations directory.

your-project
├── migrations
│   ├── 2024_12_01_120000_create_users_table.py
    ├── migration_log.json
...

The migration_log.json file content should look like this:

{
    "last_migration": "2024_12_10_121000_create_users_table",
    "timestamp": "2024-12-18T12:25:54.318426+00:00"
}

Rolling Back Migrations

Rollback the last migration

To reverse the last migration, execute the rollback command and pass last with the --migration-name argument:

bigquery-migrations rollback --migration-name last

Rollback a specific migration

To reverse a specific migration, execute the rollback command and pass the migration name with the --migration-name argument:

bigquery-migrations rollback --migration-name 2024_12_10_121000_create_users_table

Rollback all migrations

To reverse all of your migrations, execute the reset command:

bigquery-migrations reset

Authorize BigQuery Client

Put your service account JSON file in the credentials subdirectory in the root of your project.

your-project
├── credentials
│   ├── gcp-sa.json
...

Creating a Service Account for Google BigQuery

You can connect to BigQuery with a user account or a service account. A service account is a special kind of account designed to be used by applications or compute workloads, rather than a person. Service accounts don’t have passwords and use a unique email address for identification.

To create a BigQuery service account key

  1. Sign in to the Google Cloud management console.
  2. Make sure that you have API enabled on your BigQuery API page. If you don’t see API Enabled, choose Enable.
  3. On the Service accounts page, choose your BigQuery project, and then choose Create service account.
  4. On the Service account details page, enter a descriptive value for Service account name. Choose Create and continue. The Grant this service account access to the project page opens.
  5. For Select a role, choose BigQuery, and then choose BigQuery Admin.
  6. Choose Continue, and then choose Done.
  7. On the Service account page, choose the service account that you created.
  8. Choose Keys, Add key, Create new key.
  9. Choose JSON, and then choose Create. Choose the folder to save your private key or check the default folder for downloads in your browser.

Creating migrations

Put your migrations files in the migrations subdirectory of the root of your project.

your-project
├── migrations
│   ├── 2024_12_01_120000_create_users_table.py
...

Migration structure

The migration class must contain two methods: up and down.

The up method is used to add new dataset, tables, columns etc. to your BigQuery project, while the down method should reverse the operations performed by the up method.

from google.cloud import bigquery
from bigquery_migrations import Migration

class CreateUsersTable(Migration):
    """
    See:
    https://github.com/googleapis/python-bigquery/tree/main/samples
    """

    def up(self):
        # TODO: Set table_id to the ID of the table to create.
        table_id = "your_project.your_dataset.example_table"
        
        # TODO: Define table schema
        schema = [
            bigquery.SchemaField("id", "INTEGER", mode="REQUIRED"),
            bigquery.SchemaField("name", "STRING", mode="REQUIRED"),
            bigquery.SchemaField("created_at", "TIMESTAMP", mode="NULLABLE"),
        ]
        table = bigquery.Table(table_id, schema=schema)
        table = self.client.create_table(table)
        print(
            "Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id)
        )

    def down(self):
        # TODO: Set table_id to the ID of the table to fetch.
        table_id = "your_project.your_dataset.example_table"
        
        # If the table does not exist, delete_table raises
        # google.api_core.exceptions.NotFound unless not_found_ok is True.
        self.client.delete_table(table_id, not_found_ok=True)
        print("Deleted table '{}'.".format(table_id))

Migration naming conventions

Pattern yyyy_mm_dd_hhmmss_your_class_name.py
Example filename 2024_12_10_120000_create_users_table.py
Example class name CreateUsersTable

Changelog

0.5.3

Security

  • Upgraded protobuf from 5.29.1 to 6.33.6 to fix two high-severity CVEs:
    • JSON recursion depth bypass (protobuf < 5.29.6)
    • Denial of Service via recursive groups in pure-Python backend (protobuf < 5.29.5)
  • Upgraded google-cloud-bigquery from 3.27.0 to 3.40.1
  • Upgraded google-auth from 2.37.0 to 2.49.1
  • Upgraded google-api-core from 2.24.0 to 2.30.0
  • Upgraded grpcio / grpcio-status from 1.68.1 to 1.78.0
  • Added explicit protobuf>=5.29.6 lower-bound in pyproject.toml to protect downstream consumers

Fix

  • migration_cli.py: --gcp-sa-json-dir and --gcp-sa-json-fname CLI arguments were silently ignored due to both reading from args.migrations_dir instead of their own argument attributes

Removed

  • Dropped Python 3.9 support (EOL since October 2025). The updated security dependency chain (cryptography>=46, cffi>=2.0, pycparser==3.0) requires Python >=3.10. Minimum supported version is now Python 3.10

0.5.2.

Feature

  • Rollback the last migration

Documentation

  • README.md
    • Modified sections
      • Getting Started
      • Rolling Back Migrations

0.5.1

Documentation

  • README.md
    • Modified sections
      • Create the neccessary files in the folders

0.5.0

Feature

  • Rollback to a specific migration

Documentation

  • README.md
    • New sections
      • Migration log
    • Modified sections
      • Running migrations
      • Rollback migrations

0.4.3

Documentation

  • README.md
    • GCP Service account creation process updated

0.4.2

Documentation

  • README.md
    • Sample code: import correction
    • New sections:
      • GCP Service account creation process
      • Migration naming convention

0.4.1

Documentation

  • README.md sample code: removed unnecessary lines of code

0.4.0

This is the first release which uses the CHANGELOG file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bigquery_migrations-0.5.3.tar.gz (14.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bigquery_migrations-0.5.3-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file bigquery_migrations-0.5.3.tar.gz.

File metadata

  • Download URL: bigquery_migrations-0.5.3.tar.gz
  • Upload date:
  • Size: 14.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bigquery_migrations-0.5.3.tar.gz
Algorithm Hash digest
SHA256 677093649ed15949a48120c35628b5a2517d0880ab2ff5c87c06039472bd1d5e
MD5 415921d2658e1a81950d001ff17a810c
BLAKE2b-256 be205e295224f48e7593dfe49b799eb8baab7cad687beaef86a53941b0c777bd

See more details on using hashes here.

File details

Details for the file bigquery_migrations-0.5.3-py3-none-any.whl.

File metadata

File hashes

Hashes for bigquery_migrations-0.5.3-py3-none-any.whl
Algorithm Hash digest
SHA256 78349f477fa3af368d55f4663de257af4e2cd83adbc3e7d3b74f642059b241cd
MD5 e850fc464ae5ef0d66a5e3e63c2ea451
BLAKE2b-256 49590118598cf174a6f417adc52fab36819819d0cb80607a345520431be5ee75

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page