Skip to main content

Generate lookml view files from dbt models for Bigquery

Project description

dbt2looker-bigquery (active as of 13.9.2024)

Use dbt2looker-bigquery to generate Looker view files automatically from dbt models in Bigquery.

This is a fork of dbt2looker that is specific to bigquery. The intention is to allow one to define most of the simple and tedious lookml settings in dbt. That way the lookml code gets less bloated, and can be more focused on advanced metrics and explores.

Want a deeper integration between dbt and your BI tool? You should also checkout Lightdash - the open source alternative to Looker

Features

  • Warehouses: BigQuery

  • Column descriptions synced to looker

  • Dimension for each column in dbt model

  • Define Dimensions define common lookml settings in dbt like label, group label, hidden

  • Opinionated Primary key automatically set the first column to be the primary key, and hide it.

  • Create explores for structs automatically generate explores for complex tables with structs and arrays in them.

  • Dimension groups for datetime/timestamp/date columns

  • Measures defined through dbt column metadata see below

Quickstart

Run dbt2looker in the root of your dbt project after compiling looker docs. (dbt2looker-bigquery uses docs to infer types and such)

Generate Looker view files for all models:

dbt docs generate
dbt2looker

Generate Looker view files for all models tagged prod

dbt2looker --tag prod

**Generate Looker view files for all exposed models ** dbt docs - exposures

dbt2looker --exposed_only

Install

Install from PyPi repository

Install from pypi into a fresh virtual environment.

# Create virtual env
python3.7 -m venv dbt2looker-venv
source dbt2looker-venv/bin/activate

# Install
pip install dbt2looker-bigquery

# Run
dbt2looker

Or if you use Poetry

# Create virtual env
poetry add dbt2looker-bigquery

# Run
dbt2looker

Defining measures

You can define looker measures in your dbt schema.yml files. For example:

models:
  - name: pages
    columns:
      - name: url
        description: "Page url"
      - name: event_id
        description: unique event id for page view
        meta:
            looker:
              hidden: True
              label: event
              group_label: identifiers
              value_format_name: id
              
            looker_measures:
              - type: count_distinct
                sql_distinct_key: ${url}
              - type: count
                value_format_name: decimal_1

How to use in a production setting:

dbt2looker can be used in a production setting. It is a bit of a project, so if you want to try to do it, prepare to spend some time.

This explanation is based on you using some kind of pipeline tool to do ci/cd on your dbt project. I am using github actions and google cloud platform, so I will describe it as such.

after having done your continous deployment in the pipeline you can run: (assuming that the manifest.json and catalog.json is in the target folder)

dbt docs generate
dbt2looker --target-dir ./target/ --output-dir ./looker --exposures_only

at this point you have a bunch of lookml files in a folder named /looker within your github action runner. you will now need to store these files somewhere.

example using gsutil:

gsutil -m cp -r ./looker/views/ "gs://{{dbt2looker-backend}}/${{ github.ref_name }}/"

this will put the files in a google cloud storage bucket, in a folder named after the branch you deploy from. So if that branch was main, you now have a lookml state stored in a folder called /main/ in your bucket.

The next step is to create a looker repository for exposing these autogenerated files in looker. in this repo, you want a github action that syncs in the lookml files from the bucket into this project.

here is an example script: (using some ENVIRONMENT VARIABLES) (This script also allows for "locking" certain files from being synced, by adding #nosync anywhere in the file.)

          - name: Sync script
            id: find_script
            run: |
                #!/bin/bash
                set -e

                if [ -z "${GCS_AUTOGEN_BUCKET}" ]; then
                    echo "🗝️ The $ GCS_AUTOGEN_BUCKET repository variable is not set, Skipping dbt file sync. 😊"
                    exit 1
                fi

                filename="manifest.lkml"

                if [ -z "${DBT_SYNC_BRANCH}" ]; then
                    echo "🗝️ The DBT_SYNC_BRANCH variable is not set in lkml or as a repository variable. Skipping dbt file sync. 😊"
                    exit 1
                fi

                if [ -z "${SYNC_TO_TOP_LEVEL}" ]; then
                    echo "Sync to top level is not set. syncing to ./views/_autogen_dbt"
                    destination_dir="views/_autogen_dbt"
                else
                    echo "Sync to top level is set. syncing to ."
                    destination_dir="." 
                fi

                source_dir=".tmp/_autogen_dbt"
                mkdir .tmp

                mkdir $source_dir
                gsutil -m cp -r gs://${GCS_AUTOGEN_BUCKET}/${DBT_SYNC_BRANCH}/views/ $source_dir

                copy_recursive() {
                    local source="$1"
                    local destination="$2"
                    local -a stack=("$source")

                    mkdir -p "$destination" # create destination directory if it doesn't exist

                    while [ ${#stack[@]} -gt 0 ]; do
                        local last_index=$(( ${#stack[@]} - 1 ))  # Get the last index
                        local current="${stack[last_index]}"       # Get the last item in the stack
                        unset stack[last_index]                    # Remove the last item from the stack
                        stack=("${stack[@]}")                     # Rebuild the array to maintain contiguous indexing

                        echo "iterating over folder $current"
                        for item in "$current"/*; do
                            if [ -d "$item" ] && [ ! -L "$item" ]; then
                                local rel_path="${item#$source}"  # Remove source path prefix
                                rel_path="${rel_path#/}"          # Remove leading slash
                                mkdir -p "$destination/$rel_path"
                                stack+=("$item")                  # Add subdirectory to the stack
                            elif [ -f "$item" ]; then
                                local rel_path="${item#$source}"  # Remove source path prefix
                                rel_path="${rel_path#/}"          # Remove leading slash
                                local dest_path="$(dirname "$destination/$rel_path")"

                                if grep -q "#nosync" "$destination/$item"; then
                                    echo "'$item' contains '#nosync' tag and is not synced."
                                else
                                    mkdir -p "$dest_path"
                                    echo "copying $item to $dest_path"
                                    cp -f "$item" "$dest_path/"
                                fi
                            fi
                        done
                    done
                }

                copy_recursive "$source_dir" "$destination_dir"

Now you have the autogenerated, autoupdating files available in looker! But how do you keep them fresh?

One option is to trigger the github action in the looker repository from the dbt repository. One way to do this is to use a PAT-token to trigger a github action from one repo to another.

Another option is to schedule the refresh of the files in the looker-repository by running the github action for syncing files on a schedule.

So how do you use this automatically refreshing repository? My recommendation is to import the project as a dependency into your main looker project, and ref the files from the autogen looker project. That way you do not end up with merge conflicts and similar issues, and you can go back to previous versions of the sync project, if a problem arises.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt2looker_bigquery-0.12.14.tar.gz (18.2 kB view details)

Uploaded Source

Built Distribution

dbt2looker_bigquery-0.12.14-py3-none-any.whl (17.6 kB view details)

Uploaded Python 3

File details

Details for the file dbt2looker_bigquery-0.12.14.tar.gz.

File metadata

  • Download URL: dbt2looker_bigquery-0.12.14.tar.gz
  • Upload date:
  • Size: 18.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for dbt2looker_bigquery-0.12.14.tar.gz
Algorithm Hash digest
SHA256 7844ea00d765fb2b0e63aee61e41d814f7eb332738be578399d49f0107095405
MD5 8c4620073e3b332c8333a6575a0d1872
BLAKE2b-256 5323027b08c6085875a75292feb7035fa62ae8f766e59fd5976eb8c630f36600

See more details on using hashes here.

File details

Details for the file dbt2looker_bigquery-0.12.14-py3-none-any.whl.

File metadata

File hashes

Hashes for dbt2looker_bigquery-0.12.14-py3-none-any.whl
Algorithm Hash digest
SHA256 b258e8ee88f3a26807634b9b47bd168e4294e35cb05944d1b8b573189330a21c
MD5 f1f585dcea73c580a1dac8af17be923e
BLAKE2b-256 36f551b967c9595ffb2475683fe136b107e9f7aa319f68fab4a8052cde5501c0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page