BigQuery-DatasetManager is a simple file-based CLI management tool for BigQuery Datasets.
Project description
BigQuery-DatasetManager
BigQuery-DatasetManager is a simple file-based CLI management tool for BigQuery Datasets.
Requirements
Python
CPython 2,7, 3,4, 3.5, 3.6
Installation
$ pip install BigQuery-DatasetManager
Resource representation
The resource representation of the dataset and the table is described in YAML format.
Dataset
name: dataset1
friendly_name: null
description: null
default_table_expiration_ms: null
location: US
access_entries:
- role: OWNER
entity_type: specialGroup
entity_id: projectOwners
- role: WRITER
entity_type: specialGroup
entity_id: projectWriters
- role: READER
entity_type: specialGroup
entity_id: projectReaders
- role: OWNER
entity_type: userByEmail
entity_id: aaa@bbb.gserviceaccount.com
- role: null
entity_type: view
entity_id:
datasetId: view1
projectId: project1
tableId: table1
labels:
foo: bar
Key name |
Value |
Description |
||
---|---|---|---|---|
dataset_id |
str |
ID of the dataset. |
||
friendly_name |
str |
Title of the dataset. |
||
description |
str |
Description of the dataset. |
||
default_table_expiration_ms |
int |
Default expiration time for tables in the dataset. |
||
location |
str |
Location in which the dataset is hosted. |
||
access_entries |
seq |
Represents grant of an access role to an entity. |
||
access_entries |
role |
str |
Role granted to the entity. The following string values are supported:
It may also be null if the entity_type is view. |
|
entity_type |
str |
Type of entity being granted the role. One of
|
||
entity_id |
str/map |
If the entity_type is not ‘view’, the entity_id is the str ID of the entity being granted the role. If the entity_type is ‘view’, the entity_id is a dict representing the view from a different dataset to grant access to. |
||
datasetId |
str |
ID of the dataset containing this table. (Specifies when entity_type is view.) |
||
projectId |
str |
ID of the project containing this table. (Specifies when entity_type is view.) |
||
tableId |
str |
ID of the table. (Specifies when entity_type is view.) |
||
labels |
map |
Labels for the dataset. |
NOTE: See the official documentation of BigQuery Datasets for details of key names.
Table
table_id: table1
friendly_name: null
description: null
expires: null
partitioning_type: null
view_use_legacy_sql: null
view_query: null
schema:
- name: column1
field_type: STRING
mode: REQUIRED
description: null
fields: null
- name: column2
field_type: RECORD
mode: NULLABLE
description: null
fields:
- name: column2_1
field_type: STRING
mode: NULLABLE
description: null
fields: null
- name: column2_2
field_type: INTEGER
mode: NULLABLE
description: null
fields: null
- name: column2_3
field_type: RECORD
mode: REPEATED
description: null
fields:
- name: column2_3_1
field_type: BOOLEAN
mode: NULLABLE
description: null
fields: null
labels:
foo: bar
table_id: view1
friendly_name: null
description: null
expires: null
partitioning_type: null
view_use_legacy_sql: false
view_query: |
select
*
from
`project1.dataset1.table1`
schema: null
labels: null
Key name |
Value |
Description |
|
---|---|---|---|
table_id |
str |
ID of the table. |
|
friendly_name |
str |
Title of the table. |
|
description |
str |
Description of the table. |
|
expires |
str |
Datetime at which the table will be deleted. (ISO8601 format %Y-%m-%dT%H:%M:%S.%f%z) |
|
partitioning_type |
str |
Time partitioning of the table if it is partitioned. The only partitioning type that is currently supported is DAY. |
|
view_use_legacy_sql |
bool |
Specifies whether to use BigQuery’s legacy SQL for this view. |
|
view_query |
str |
SQL query defining the table as a view. |
|
schema |
seq |
The schema of the table destination for the row. |
|
schema |
name |
str |
The name of the field. |
field_type |
str |
The type of the field. One of
|
|
mode |
str |
The mode of the field. One of
|
|
description |
str |
Description for the field. |
|
fields |
seq |
Describes the nested schema fields if the type property is set to RECORD. |
|
labels |
map |
Labels for the table. |
NOTE: See the official documentation of BigQuery Tables for details of key names.
Directory structure
.
├── dataset1 # Directory storing the table configuration file of dataset1.
│ ├── table1.yml # Configuration file of table1 in dataset1.
│ └── table2.yml # Configuration file of table2 in dataset1.
├── dataset1.yml # Configuration file of dataset1.
├── dataset2 # Directory storing the table configuration file of dataset2.
│ └── .gitkeep # When keeping a directory, dataset2 is empty.
├── dataset2.yml # Configuration file of dataset2.
└── dataset3.yml # Configuration file of dataset3. This dataset does not manage the table.
NOTE: If you do not want to manage the table, delete the directory with the same name as the dataset name.
Usage
Usage: bqdm [OPTIONS] COMMAND [ARGS]...
Options:
-c, --credential-file PATH Location of credential file for service accounts.
-p, --project TEXT Project ID for the project which you’d like to manage with.
--color / --no-color Enables output with coloring.
--parallelism INTEGER Limit the number of concurrent operation.
--debug Debug output management.
-h, --help Show this message and exit.
Commands:
apply Builds or changes datasets.
destroy Specify subcommand `plan` or `apply`
export Export existing datasets into file in YAML format.
plan Generate and show an execution plan.
Export
Usage: bqdm export [OPTIONS] [OUTPUT_DIR]
Export existing datasets into file in YAML format.
Options:
-d, --dataset TEXT Specify the ID of the dataset to manage.
-e, --exclude-dataset TEXT Specify the ID of the dataset to exclude from managed.
-h, --help Show this message and exit.
Plan
Usage: bqdm plan [OPTIONS] [CONF_DIR]
Generate and show an execution plan.
Options:
--detailed_exitcode Return a detailed exit code when the command exits.
When provided, this argument changes
the exit codes and their meanings to provide
more granular information about what the
resulting plan contains:
0 = Succeeded with empty diff
1 = Error
2 = Succeeded with non-
empty diff
-d, --dataset TEXT Specify the ID of the dataset to manage.
-e, --exclude-dataset TEXT Specify the ID of the dataset to exclude from managed.
-h, --help Show this message and exit.
Apply
Usage: bqdm apply [OPTIONS] [CONF_DIR]
Builds or changes datasets.
Options:
-d, --dataset TEXT Specify the ID of the dataset to manage.
-e, --exclude-dataset TEXT Specify the ID of the dataset to exclude from managed.
-m, --mode [select_insert|select_insert_backup|replace|replace_backup|drop_create|drop_create_backup]
Specify the migration mode when changing the schema.
Choice from `select_insert`,
`select_insert_backup`, `replace`, r`eplace_backup`,
`drop_create`,
`drop_create_backup`. [required]
-b, --backup-dataset TEXT Specify the ID of the dataset to store the backup at migration
-h, --help Show this message and exit.
NOTE: See migration mode
Destroy
Usage: bqdm destroy [OPTIONS] COMMAND [ARGS]...
Specify subcommand `plan` or `apply`
Options:
-h, --help Show this message and exit.
Commands:
apply Destroy managed datasets.
plan Generate and show an execution plan for...
Destroy plan
Usage: bqdm destroy plan [OPTIONS] [CONF_DIR]
Generate and show an execution plan for datasets destruction.
Options:
--detailed-exitcode Return a detailed exit code when the command exits.
When provided, this argument changes
the exit codes and their meanings to provide
more granular information about what the
resulting plan contains:
0 = Succeeded with empty diff
1 = Error
2 = Succeeded with non-
empty diff
-d, --dataset TEXT Specify the ID of the dataset to manage.
-e, --exclude-dataset TEXT Specify the ID of the dataset to exclude from managed.
-h, --help Show this message and exit.
Destroy apply
Usage: bqdm destroy apply [OPTIONS] [CONF_DIR]
Destroy managed datasets.
Options:
-d, --dataset TEXT Specify the ID of the dataset to manage.
-e, --exclude-dataset TEXT Specify the ID of the dataset to exclude from managed.
-h, --help Show this message and exit.
Migration mode
select_insert
TODO
LIMITATIONS: TODO
select_insert_backup
TODO
LIMITATIONS: TODO
replace
TODO
LIMITATIONS: TODO
replace_backup
TODO
LIMITATIONS: TODO
drop_create
TODO
drop_create_backup
TODO
Authentication
See authentication section in the official documentation of google-cloud-python.
If you’re running in Compute Engine or App Engine, authentication should “just work”.
If you’re developing locally, the easiest way to authenticate is using the Google Cloud SDK:
$ gcloud auth application-default login
Note that this command generates credentials for client libraries. To authenticate the CLI itself, use:
$ gcloud auth login
Previously, gcloud auth login was used for both use cases. If your gcloud installation does not support the new command, please update it:
$ gcloud components update
If you’re running your application elsewhere, you should download a service account JSON keyfile and point to it using an environment variable:
$ export GOOGLE_APPLICATION_CREDENTIALS="/path/to/keyfile.json"
Testing
Depends on the following environment variables:
$ export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
$ export GOOGLE_CLOUD_PROJECT=YOUR_PROJECT_ID
Run test
$ pip install pipenv
$ pipenv install --dev
$ pipenv run pytest
Run test multiple Python versions
$ pip install pipenv
$ pipenv install --dev
$ pyenv local 3.6.5 3.5.5 3.4.8 2.7.14
$ pipenv run tox
TODO
Support encryption configuration for table
Support external data configuration for table
Schema replication
Integration tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for BigQuery-DatasetManager-0.1.5.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b19b1a43256dcbb11cdc7fa8c9ef4e6d14406e6ab4aa268e43e075d03f2d4f3 |
|
MD5 | 2a77d703bd63bdd22e394359ef11c9a8 |
|
BLAKE2b-256 | 5025b56a4e066aa2ac9587e7a9988164b8d7c4466ff18ee42f1449e2fbc5b6bf |
Hashes for BigQuery_DatasetManager-0.1.5-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad3ebb658d62c4600369ac57f23b42d8d73b09554d1a1b19345f2232bc490e9b |
|
MD5 | dc90c01e6443ccb88ee59c18becc1828 |
|
BLAKE2b-256 | 43a127f408ea6c63e9e7917def0a540bec3719684f954cf31d15224651d0403e |