Generate lookml view files from dbt models for Bigquery
Project description
dbt2looker-bigquery (active as of 13.9.2024)
Use dbt2looker-bigquery
to generate Looker view files automatically from dbt models in Bigquery.
This is a fork of dbt2looker that is specific to bigquery. The intention is to allow one to define most of the simple and tedious lookml settings in dbt. That way the lookml code gets less bloated, and can be more focused on advanced metrics and explores.
Want a deeper integration between dbt and your BI tool? You should also checkout Lightdash - the open source alternative to Looker
Features
-
Warehouses: BigQuery
-
Column descriptions synced to looker
-
Dimension for each column in dbt model
-
Define Dimensions define common lookml settings in dbt like label, group label, hidden
-
Opinionated Primary key automatically set the first column to be the primary key, and hide it.
-
Create explores for structs automatically generate explores for complex tables with structs and arrays in them.
-
Dimension groups for datetime/timestamp/date columns
-
Measures defined through dbt column
metadata
see below
Quickstart
Run dbt2looker
in the root of your dbt project after compiling looker docs.
(dbt2looker-bigquery uses docs to infer types and such)
Generate Looker view files for all models:
dbt docs generate
dbt2looker
Generate Looker view files for all models tagged prod
dbt2looker --tag prod
**Generate Looker view files for all exposed models ** dbt docs - exposures
dbt2looker --exposed_only
Install
Install from PyPi repository
Install from pypi into a fresh virtual environment.
# Create virtual env
python3.7 -m venv dbt2looker-venv
source dbt2looker-venv/bin/activate
# Install
pip install dbt2looker-bigquery
# Run
dbt2looker
Or if you use Poetry
# Create virtual env
poetry add dbt2looker-bigquery
# Run
dbt2looker
Defining measures
You can define looker measures in your dbt schema.yml
files. For example:
models:
- name: pages
columns:
- name: url
description: "Page url"
- name: event_id
description: unique event id for page view
meta:
looker:
hidden: True
label: event
group_label: identifiers
value_format_name: id
looker_measures:
- type: count_distinct
sql_distinct_key: ${url}
- type: count
value_format_name: decimal_1
How to use in a production setting:
dbt2looker can be used in a production setting. It is a bit of a project, so if you want to try to do it, prepare to spend some time.
This explanation is based on you using some kind of pipeline tool to do ci/cd on your dbt project. I am using github actions and google cloud platform, so I will describe it as such.
after having done your continous deployment in the pipeline you can run: (assuming that the manifest.json and catalog.json is in the target folder)
dbt docs generate
dbt2looker --target-dir ./target/ --output-dir ./looker --exposures_only
at this point you have a bunch of lookml files in a folder named /looker within your github action runner. you will now need to store these files somewhere.
example using gsutil:
gsutil -m cp -r ./looker/views/ "gs://{{dbt2looker-backend}}/${{ github.ref_name }}/"
this will put the files in a google cloud storage bucket, in a folder named after the branch you deploy from. So if that branch was main, you now have a lookml state stored in a folder called /main/ in your bucket.
The next step is to create a looker repository for exposing these autogenerated files in looker. in this repo, you want a github action that syncs in the lookml files from the bucket into this project.
here is an example script: (using some ENVIRONMENT VARIABLES) (This script also allows for "locking" certain files from being synced, by adding #nosync anywhere in the file.)
- name: Sync script
id: find_script
run: |
#!/bin/bash
set -e
if [ -z "${GCS_AUTOGEN_BUCKET}" ]; then
echo "🗝️ The $ GCS_AUTOGEN_BUCKET repository variable is not set, Skipping dbt file sync. 😊"
exit 1
fi
filename="manifest.lkml"
if [ -z "${DBT_SYNC_BRANCH}" ]; then
echo "🗝️ The DBT_SYNC_BRANCH variable is not set in lkml or as a repository variable. Skipping dbt file sync. 😊"
exit 1
fi
if [ -z "${SYNC_TO_TOP_LEVEL}" ]; then
echo "Sync to top level is not set. syncing to ./views/_autogen_dbt"
destination_dir="views/_autogen_dbt"
else
echo "Sync to top level is set. syncing to ."
destination_dir="."
fi
source_dir=".tmp/_autogen_dbt"
mkdir .tmp
mkdir $source_dir
gsutil -m cp -r gs://${GCS_AUTOGEN_BUCKET}/${DBT_SYNC_BRANCH}/views/ $source_dir
copy_recursive() {
local source="$1"
local destination="$2"
local -a stack=("$source")
mkdir -p "$destination" # create destination directory if it doesn't exist
while [ ${#stack[@]} -gt 0 ]; do
local last_index=$(( ${#stack[@]} - 1 )) # Get the last index
local current="${stack[last_index]}" # Get the last item in the stack
unset stack[last_index] # Remove the last item from the stack
stack=("${stack[@]}") # Rebuild the array to maintain contiguous indexing
echo "iterating over folder $current"
for item in "$current"/*; do
if [ -d "$item" ] && [ ! -L "$item" ]; then
local rel_path="${item#$source}" # Remove source path prefix
rel_path="${rel_path#/}" # Remove leading slash
mkdir -p "$destination/$rel_path"
stack+=("$item") # Add subdirectory to the stack
elif [ -f "$item" ]; then
local rel_path="${item#$source}" # Remove source path prefix
rel_path="${rel_path#/}" # Remove leading slash
local dest_path="$(dirname "$destination/$rel_path")"
if grep -q "#nosync" "$destination/$item"; then
echo "'$item' contains '#nosync' tag and is not synced."
else
mkdir -p "$dest_path"
echo "copying $item to $dest_path"
cp -f "$item" "$dest_path/"
fi
fi
done
done
}
copy_recursive "$source_dir" "$destination_dir"
Now you have the autogenerated, autoupdating files available in looker! But how do you keep them fresh?
One option is to trigger the github action in the looker repository from the dbt repository. One way to do this is to use a PAT-token to trigger a github action from one repo to another.
Another option is to schedule the refresh of the files in the looker-repository by running the github action for syncing files on a schedule.
So how do you use this automatically refreshing repository? My recommendation is to import the project as a dependency into your main looker project, and ref the files from the autogen looker project. That way you do not end up with merge conflicts and similar issues, and you can go back to previous versions of the sync project, if a problem arises.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dbt2looker_bigquery-0.13.1.tar.gz
.
File metadata
- Download URL: dbt2looker_bigquery-0.13.1.tar.gz
- Upload date:
- Size: 18.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4cc9cc0be32c77dfb18c2f2a4ef927275de37837a9d94db6133dbe7c47ce002b |
|
MD5 | ca88c4b6705467cce67a26f6ad193c20 |
|
BLAKE2b-256 | 2a353437df9b9d465c07e0229499729f664925415af84012828ec5c0086151f0 |
File details
Details for the file dbt2looker_bigquery-0.13.1-py3-none-any.whl
.
File metadata
- Download URL: dbt2looker_bigquery-0.13.1-py3-none-any.whl
- Upload date:
- Size: 17.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8fe006a575f78bd09ac46ee514c4a5f0198ebd7a4e5bf9c8a2cab38f12bde100 |
|
MD5 | f0a66eeebb97f57de59c351d3d961f1d |
|
BLAKE2b-256 | 479bb55205e280b647eeee8e5353f52e42089c38372648e3c9be5a8a309ebd61 |