BigQuery-DatasetManager is a simple file-based CLI management tool for BigQuery Datasets.
Project description
BigQuery-DatasetManager
BigQuery-DatasetManager is a simple file-based CLI management tool for BigQuery Datasets.
Requirements
Python
CPython 2,7, 3,4, 3.5, 3.6
Installation
$ pip install BigQuery-DatasetManager
Resource representation
The resource representation of the dataset and the table is described in YAML format.
Dataset
name: dataset1
friendly_name: null
description: null
default_table_expiration_ms: null
location: US
access_entries:
- role: OWNER
entity_type: specialGroup
entity_id: projectOwners
- role: WRITER
entity_type: specialGroup
entity_id: projectWriters
- role: READER
entity_type: specialGroup
entity_id: projectReaders
- role: OWNER
entity_type: userByEmail
entity_id: aaa@bbb.gserviceaccount.com
- role: null
entity_type: view
entity_id:
datasetId: view1
projectId: project1
tableId: table1
labels:
foo: bar
Key name |
Value |
Description |
||
---|---|---|---|---|
dataset_id |
str |
ID of the dataset. |
||
friendly_name |
str |
Title of the dataset. |
||
description |
str |
Description of the dataset. |
||
default_table_expiration_ms |
int |
Default expiration time for tables in the dataset. |
||
location |
str |
Location in which the dataset is hosted. |
||
access_entries |
seq |
Represents grant of an access role to an entity. |
||
access_entries |
role |
str |
Role granted to the entity. The following string values are supported:
It may also be null if the entity_type is view. |
|
entity_type |
str |
Type of entity being granted the role. One of
|
||
entity_id |
str/map |
If the entity_type is not ‘view’, the entity_id is the str ID of the entity being granted the role. If the entity_type is ‘view’, the entity_id is a dict representing the view from a different dataset to grant access to. |
||
datasetId |
str |
ID of the dataset containing this table. (Specifies when entity_type is view.) |
||
projectId |
str |
ID of the project containing this table. (Specifies when entity_type is view.) |
||
tableId |
str |
ID of the table. (Specifies when entity_type is view.) |
||
labels |
map |
Labels for the dataset. |
NOTE: See the official documentation of BigQuery Datasets for details of key names.
Table
table_id: table1
friendly_name: null
description: null
expires: null
partitioning_type: null
view_use_legacy_sql: null
view_query: null
schema:
- name: column1
field_type: STRING
mode: REQUIRED
description: null
fields: null
- name: column2
field_type: RECORD
mode: NULLABLE
description: null
fields:
- name: column2_1
field_type: STRING
mode: NULLABLE
description: null
fields: null
- name: column2_2
field_type: INTEGER
mode: NULLABLE
description: null
fields: null
- name: column2_3
field_type: RECORD
mode: REPEATED
description: null
fields:
- name: column2_3_1
field_type: BOOLEAN
mode: NULLABLE
description: null
fields: null
labels:
foo: bar
table_id: view1
friendly_name: null
description: null
expires: null
partitioning_type: null
view_use_legacy_sql: false
view_query: |
select
*
from
`project1.dataset1.table1`
schema: null
labels: null
Key name |
Value |
Description |
|
---|---|---|---|
table_id |
str |
ID of the table. |
|
friendly_name |
str |
Title of the table. |
|
description |
str |
Description of the table. |
|
expires |
str |
Datetime at which the table will be deleted. (ISO8601 format %Y-%m-%dT%H:%M:%S.%f%z) |
|
partitioning_type |
str |
Time partitioning of the table if it is partitioned. The only partitioning type that is currently supported is DAY. |
|
view_use_legacy_sql |
bool |
Specifies whether to use BigQuery’s legacy SQL for this view. |
|
view_query |
str |
SQL query defining the table as a view. |
|
schema |
seq |
The schema of the table destination for the row. |
|
schema |
name |
str |
The name of the field. |
field_type |
str |
The type of the field. One of
|
|
mode |
str |
The mode of the field. One of
|
|
description |
str |
Description for the field. |
|
fields |
seq |
Describes the nested schema fields if the type property is set to RECORD. |
|
labels |
map |
Labels for the table. |
NOTE: See the official documentation of BigQuery Tables for details of key names.
Directory structure
.
├── dataset1 # Directory storing the table configuration file of dataset1.
│ ├── table1.yml # Configuration file of table1 in dataset1.
│ └── table2.yml # Configuration file of table2 in dataset1.
├── dataset1.yml # Configuration file of dataset1.
├── dataset2 # Directory storing the table configuration file of dataset2.
│ └── .gitkeep # When keeping a directory, dataset2 is empty.
├── dataset2.yml # Configuration file of dataset2.
└── dataset3.yml # Configuration file of dataset3. This dataset does not manage the table.
NOTE: If you do not want to manage the table, delete the directory with the same name as the dataset name.
Usage
Usage: bqdm [OPTIONS] COMMAND [ARGS]...
Options:
-c, --credential-file PATH Location of credential file for service accounts.
-p, --project TEXT Project ID for the project which you’d like to manage with.
--color / --no-color Enables output with coloring.
--parallelism INTEGER Limit the number of concurrent operation.
--debug Debug output management.
-h, --help Show this message and exit.
Commands:
apply Builds or changes datasets.
destroy Specify subcommand `plan` or `apply`
export Export existing datasets into file in YAML format.
plan Generate and show an execution plan.
Export
Usage: bqdm export [OPTIONS] [OUTPUT_DIR]
Export existing datasets into file in YAML format.
Options:
-d, --dataset TEXT Specify the ID of the dataset to manage.
-e, --exclude-dataset TEXT Specify the ID of the dataset to exclude from managed.
-h, --help Show this message and exit.
Plan
Usage: bqdm plan [OPTIONS] [CONF_DIR]
Generate and show an execution plan.
Options:
--detailed_exitcode Return a detailed exit code when the command exits.
When provided, this argument changes
the exit codes and their meanings to provide
more granular information about what the
resulting plan contains:
0 = Succeeded with empty diff
1 = Error
2 = Succeeded with non-
empty diff
-d, --dataset TEXT Specify the ID of the dataset to manage.
-e, --exclude-dataset TEXT Specify the ID of the dataset to exclude from managed.
-h, --help Show this message and exit.
Apply
Usage: bqdm apply [OPTIONS] [CONF_DIR]
Builds or changes datasets.
Options:
-d, --dataset TEXT Specify the ID of the dataset to manage.
-e, --exclude-dataset TEXT Specify the ID of the dataset to exclude from managed.
-m, --mode [select_insert|select_insert_backup|replace|replace_backup|drop_create|drop_create_backup]
Specify the migration mode when changing the schema.
Choice from `select_insert`,
`select_insert_backup`, `replace`, r`eplace_backup`,
`drop_create`,
`drop_create_backup`. [required]
-b, --backup-dataset TEXT Specify the ID of the dataset to store the backup at migration
-h, --help Show this message and exit.
NOTE: See migration mode
Destroy
Usage: bqdm destroy [OPTIONS] COMMAND [ARGS]...
Specify subcommand `plan` or `apply`
Options:
-h, --help Show this message and exit.
Commands:
apply Destroy managed datasets.
plan Generate and show an execution plan for...
Destroy plan
Usage: bqdm destroy plan [OPTIONS] [CONF_DIR]
Generate and show an execution plan for datasets destruction.
Options:
--detailed-exitcode Return a detailed exit code when the command exits.
When provided, this argument changes
the exit codes and their meanings to provide
more granular information about what the
resulting plan contains:
0 = Succeeded with empty diff
1 = Error
2 = Succeeded with non-
empty diff
-d, --dataset TEXT Specify the ID of the dataset to manage.
-e, --exclude-dataset TEXT Specify the ID of the dataset to exclude from managed.
-h, --help Show this message and exit.
Destroy apply
Usage: bqdm destroy apply [OPTIONS] [CONF_DIR]
Destroy managed datasets.
Options:
-d, --dataset TEXT Specify the ID of the dataset to manage.
-e, --exclude-dataset TEXT Specify the ID of the dataset to exclude from managed.
-h, --help Show this message and exit.
Migration mode
select_insert
TODO
LIMITATIONS: TODO
select_insert_backup
TODO
LIMITATIONS: TODO
replace
TODO
LIMITATIONS: TODO
replace_backup
TODO
LIMITATIONS: TODO
drop_create
TODO
drop_create_backup
TODO
Authentication
See authentication section in the official documentation of google-cloud-python.
If you’re running in Compute Engine or App Engine, authentication should “just work”.
If you’re developing locally, the easiest way to authenticate is using the Google Cloud SDK:
$ gcloud auth application-default login
Note that this command generates credentials for client libraries. To authenticate the CLI itself, use:
$ gcloud auth login
Previously, gcloud auth login was used for both use cases. If your gcloud installation does not support the new command, please update it:
$ gcloud components update
If you’re running your application elsewhere, you should download a service account JSON keyfile and point to it using an environment variable:
$ export GOOGLE_APPLICATION_CREDENTIALS="/path/to/keyfile.json"
Testing
Depends on the following environment variables:
$ export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
$ export GOOGLE_CLOUD_PROJECT=YOUR_PROJECT_ID
Run test
$ pip install pipenv
$ pipenv install --dev
$ pipenv run pytest
Run test multiple Python versions
$ pip install pipenv
$ pipenv install --dev
$ pyenv local 3.6.5 3.5.5 3.4.8 2.7.14
$ pipenv run tox
TODO
Support encryption configuration for table
Support external data configuration for table
Schema replication
Integration tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file BigQuery-DatasetManager-0.1.6.tar.gz
.
File metadata
- Download URL: BigQuery-DatasetManager-0.1.6.tar.gz
- Upload date:
- Size: 30.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bba9fae4f699f0ea2ace91470ee9f1ab5ea73e8aa16bae90c590bebba319177a |
|
MD5 | 2004b5e948671a6e79244dd8f463e3be |
|
BLAKE2b-256 | d7ffb28eae846f4d8c2f39edec753aa68a08ce29178606db1ec8eafc9d74009d |
File details
Details for the file BigQuery_DatasetManager-0.1.6-py2.py3-none-any.whl
.
File metadata
- Download URL: BigQuery_DatasetManager-0.1.6-py2.py3-none-any.whl
- Upload date:
- Size: 32.0 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e18e41028e7cd1c35a3f6268617645f70af36434d69454d0ebb89b905e6fca46 |
|
MD5 | 94e63d12dbceac369d2c6670eb96d0fb |
|
BLAKE2b-256 | f7cc3674a0cb3e154c03edf26ae1c41e4757a16538ebaffb76811801106d10ba |