Skip to main content

Simple tool for writing Google BigQuery migrations

Project description

python-bigquery-migrations

Python bigquery-migrations package is for creating and manipulating BigQuery databases easily.

Migrations are like version control for your database, allowing you to define and share the application's datasets and table schema definitions.

Getting Started

Install

pip install bigquery-migrations

Create the project folder structure

Create two subdirectory:

  1. credentials
  2. migrations
your-project-root-folder
├── credentials
├── migrations
└── ...

Create the neccessary files in the folders

Put your Google Cloud Service Account JSON file in the credentials subdirectory. See more info in the Authorize BigQuery Client section

Create your own migrations and put them in the migrations directory. See the Migration structure section and Migration naming conventions section for more info.

your-project
├── credentials
│   ├── gcp-sa.json
├── migrations
│   ├── 2024_12_01_120000_create_users_table.py
└── ...

Running migrations

IMPORTANT!
You have to create your own Migrations first! Jump to Creating Migrations section

To run all of your outstanding migrations, execute the run command:

bigquery-migrations run

You can specify the Google Cloud Project id witth the --gcp-project-id argument:

bigquery-migrations run --gcp-project-id

Rolling Back Migrations

To reverse all of your migrations, execute the reset command:

bigquery-migrations reset

Authorize BigQuery Client

Put your service account JSON file in the credentials subdirectory in the root of your project.

your-project
├── credentials
│   ├── gcp-sa.json
...

Creating a Service Account for Google BigQuery

You can connect to BigQuery with a user account or a service account. A service account is a special kind of account designed to be used by applications or compute workloads, rather than a person.

Service accounts don’t have passwords and use a unique email address for identification. You can associate each service account with a service account key, which is a public or private RSA key pair. In this walkthrough, we use a service account key in AWS SCT to access your BigQuery project.

To create a BigQuery service account key

  1. Sign in to the Google Cloud management console.
  2. Make sure that you have API enabled on your BigQuery API page. If you don’t see API Enabled, choose Enable.
  3. On the Service accounts page, choose your BigQuery project, and then choose Create service account.
  4. On the Service account details page, enter a descriptive value for Service account name. Choose Create and continue. The Grant this service account access to the project page opens.
  5. For Select a role, choose BigQuery, and then choose BigQuery Admin. AWS SCT uses permissions to manage all resources within the project to load your BigQuery metadata in the migration project.
  6. Choose Add another role. For Select a role, choose Cloud Storage, and then choose Storage Admin. AWS SCT uses full control of data objects and buckets to extract your data from BigQuery and then load it into Amazon Redshift.
  7. Choose Continue, and then choose Done.
  8. On the Service account page, choose the service account that you created.
  9. Choose Keys, Add key, Create new key.
  10. Choose JSON, and then choose Create. Choose the folder to save your private key or check the default folder for downloads in your browser.

Creating migrations

Put your migrations files in the migrations subdirectory of the root of your project.

your-project
├── migrations
│   ├── 2024_12_01_120000_create_users_table.py
...

Migration structure

The migration class must contain two methods: up and down.

The up method is used to add new dataset, tables, columns etc. to your BigQuery project, while the down method should reverse the operations performed by the up method.

from google.cloud import bigquery
from bigquery_migrations import Migration

class CreateUsersTable(Migration):
    """
    See:
    https://github.com/googleapis/python-bigquery/tree/main/samples
    """

    def up(self):
        # TODO: Set table_id to the ID of the table to create.
        table_id = "your_project.your_dataset.example_table"
        
        # TODO: Define table schema
        schema = [
            bigquery.SchemaField("id", "INTEGER", mode="REQUIRED"),
            bigquery.SchemaField("name", "STRING", mode="REQUIRED"),
            bigquery.SchemaField("created_at", "TIMESTAMP", mode="NULLABLE"),
        ]
        table = bigquery.Table(table_id, schema=schema)
        table = self.client.create_table(table)
        print(
            "Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id)
        )

    def down(self):
        # TODO: Set table_id to the ID of the table to fetch.
        table_id = "your_project.your_dataset.example_table"
        
        # If the table does not exist, delete_table raises
        # google.api_core.exceptions.NotFound unless not_found_ok is True.
        self.client.delete_table(table_id, not_found_ok=True)
        print("Deleted table '{}'.".format(table_id))

Migration naming conventions

Pattern yyyy_mm_dd_hhmmss_your_class_name.py
Example filename 2024_12_10_120000_create_users_table.py
Example class name CreateUsersTable

Changelog

0.4.2

Documentation

  • README.md
    • sample code: import correction
    • new sections:
      • GCP Service account creation process
      • Migration naming convention

0.4.1

Documentation

  • README.md sample code: removed unnecessary lines of code

0.4.0

This is the first release which uses the CHANGELOG file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bigquery_migrations-0.4.2.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bigquery_migrations-0.4.2-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file bigquery_migrations-0.4.2.tar.gz.

File metadata

  • Download URL: bigquery_migrations-0.4.2.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for bigquery_migrations-0.4.2.tar.gz
Algorithm Hash digest
SHA256 be6e5943d2e191da6ec13107b48d898a5510e952e19817eda1ee0f3ae5fca64d
MD5 ca070008e47f70d2e29f06edfc733203
BLAKE2b-256 9b07941be947c1eacad4a24895521fc2551289b7e5ef900cc72027c9687e9c8e

See more details on using hashes here.

File details

Details for the file bigquery_migrations-0.4.2-py3-none-any.whl.

File metadata

File hashes

Hashes for bigquery_migrations-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 860bf206efa33ac6035eaa5c9f5fff0915614a20ad62fc09fd3e47079f344a4a
MD5 f25f86474ce761f09d1a51eb4a352828
BLAKE2b-256 da9a0dbb44950d77e6e32bf501e2283467462d5c91129fc6796ddd51068533a9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page