Data Catalogs Made Easy
Project description
Build a custom data-catalog in minutes
🔍️ 1. What is CatalogBuilder?
- CatalogBuilder is a simple tool to generate & deploy a documentation website for your data assets.
- It enables anyone at your company to quickly find the trusted data they are looking for.
💡 2. Why CatalogBuilder?
There are many open-source projects (admundsen, open-metadata, datahub, metacat, atlas) to build such a catalog in-house. But as they offer a lot of advanced features, they are hard to manage and deploy if you're not a tech expert. They can be even harder to customize.
dbt docs is great to generate a documentation website on top of your dbt assets but:
- it focuses on dbt only (while you are interested in other sources + metadata)
- is very hard to customize (except you're an angular expert)
- can be slow.
👉 CatalogBuilder aims at offering a lightweight alternative to generate a documentation website on top of your data assets. It focuses on read-only data discovery and:
- ✔️ can be easily customized and deployed by low tech people
- ✔️ can then handle the very specific needs of your company
- ✔️ is fast and lightweight
- ✔️ is built on top of the very famous mkdocs-material python library which is used by millions of developers to deploy their documentation (such as fastapi).
💥 3. Getting Started with catalog
CLI
catalog
is the CLI (command-line-interface) of CatalogBuilder to generate, show & deploy the documentation.
3.1 Install catalog
CLI 🛠️
pip install catalog-builder
3.2 Create your first documentation configuration 👨💻
catalog download dbt_gitlab_data_team
To get started, let's download a catalog configuration example from the GitHub repo and play with it. The above command will download the catalogs/dbt_gitlab_data_team
folder on your laptop.
You will find in the folder:
assets file
: a file containing the list of the assets you want to put in your documentation. It can be a parquet file namedassets.parquet
or a json lines file namedassets.jsonl
. Each asset in the file must have the following fields:
asset_type
: for example:table
.documentation_path
: the path of the asset page in the generated documentation. For exampledataset_name/table_name
.data
: a dict of attributes used to generate the documentation. For example{"name": "foo"}
generate_assets_file.py
: the python script used to (re)generate theassets file
.requirements.txt
: the python requirements needed bygenerate_assets_file.py
.templates
: a folder which includes a jinja-template markdown-file for eachasset_type
. These templates are used to generate a markdown documentation file for each asset.source_docs
: a folder which includes files to include as-is in the documentation.mkdocs.yml
: the mkdocs configuration file used by mkdocs to build the documentation website from the generated markdown files.
3.3 Build your catalog website 👾
catalog build dbt_gitlab_data_team
- For each asset of the
assets file
, the jinja template ofasset_type
will be rendered using the assetdata
to generate a markdown file which will be written intocatalogs/dbt_gitlab_data_team/docs/
atdocumentation_path
.- All files in
catalogs/dbt_gitlab_data_team/source_docs/
are copied intocatalogs/dbt_gitlab_data_team/docs/
- Mkdocs will then build the documentation website from the markdown files into
catalogs/dbt_gitlab_data_team/site
(usingmkdocs.yml
configuration file).
3.4 Run your catalog website locally ⚡
catalog serve dbt_gitlab_data_team
You can now see the generated documentation website at http://localhost:8000.
3.5 Deploy the documentation website! 🚀
A. To deploy on GitHub pages:
catalog deploy github-pages dbt_gitlab_data_team
Mkdocs will deploy the site on GitHub pages (this only works if you are on a github repository).
B. To deploy on Google Cloud Storage Bucket:
catalog deploy gcs dbt_gitlab_data_team
Mkdocs will copy all the files in
catalogs/dbt_gitlab_data_team/site
to the bucket defined bysite_url
value ofcatalogs/dbt_gitlab_data_team/mkdocs.yml
. For instance if the site url ishttp://catalogs.unytics.io/dbt_gitlab_data_team/
it will copy all files undercatalogs/dbt_gitlab_data_team/site
togs://catalogs.unytics.io/dbt_gitlab_data_team/
C. To deploy elsewhere:
You can follow these instructions from mkdocs.
💎 4. Generate your dbt documentation
To generate a documentation website for your own dbt project, do the following:
- Change directory to your dbt project directory
- Download
catalogs/dbt
documentation example by runningcatalog download dbt
. - Run
dbt docs generate
to computetarget/manifest.json
andtarget/catalog.json
. - Generate the assets file by running
python catalogs/dbt/generate_assets_file.py
. The script will parsetarget/manifest.json
andtarget/catalog.json
to generate theassets file
in the expected format. - Run
catalog serve dbt
to build the website and show it locally.
Keep in touch 🧑💻
Join our Slack for any question, to get help for getting started, to speak about a bug, to suggest improvements, or simply if you want to have a chat 🙂.
👋 Contribute
Any contribution is more than welcome 🤗!
- Add a ⭐ on the repo to show your support
- Join our Slack and talk with us
- Raise an issue to raise a bug or suggest improvements
- Open a PR!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file catalog_builder-0.6.tar.gz
.
File metadata
- Download URL: catalog_builder-0.6.tar.gz
- Upload date:
- Size: 8.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b16914e3d08dd9cac3d2208114c09fa9de832c32588034d4fc7358733b66cf0 |
|
MD5 | dd6538aa12b9bb1d3c8d298f7dc97533 |
|
BLAKE2b-256 | 41615076462e2335f90f09b69b5b91b2d6c576da812dfbe7be5f207b75fa6df9 |