Skip to main content

Extract database metadata, generate dbt-compatible YAML, compare schemas, and deliver reports — all in a single Python import.

Project description

data_dictionary_builder

A Python library that automates database documentation — extract live schema metadata, generate dbt-compatible YAML, compare schemas across environments, and deliver PDF reports, all in a single import.

PyPI Python License: MIT


Installation

# Core library (SQLite works out of the box)
pip install data-dictionary-builder

# With the connectors you need
pip install "data-dictionary-builder[postgres]"
pip install "data-dictionary-builder[mysql]"
pip install "data-dictionary-builder[clickhouse]"
pip install "data-dictionary-builder[spanner]"

# Everything at once
pip install "data-dictionary-builder[all]"

Or use the CLI to install connectors after the fact:

ddgen install postgres
ddgen install clickhouse
ddgen install all

Supported Databases

Database Extra Driver
SQLite (built-in) sqlite3 (stdlib)
PostgreSQL [postgres] psycopg2-binary
MySQL / MariaDB [mysql] PyMySQL
ClickHouse [clickhouse] clickhouse-connect (HTTP/HTTPS)
Google Cloud Spanner [spanner] google-cloud-spanner

Quick Start

from data_dictionary_builder import MetadataExtractor, YAMLGenerator, DDHelper, ExecutionTimer

timer  = ExecutionTimer()
helper = DDHelper(".")      # creates models/, reports/json/, reports/pdf/

with timer.task("Extract"):
    with MetadataExtractor(
        db_type="postgres", host="localhost", port=5432,
        database="mydb", user="readonly", password="secret",
    ) as ext:
        db_meta = ext.extract_all_schemas(
            schema_filter=["public", "analytics"],
            parallel_workers=8,
        )

with timer.task("Generate YAML"):
    YAMLGenerator(output_dir=str(helper.models_dir)).generate_yaml_files(db_meta)

timer.summary()

CLI

# Check which connectors are installed
ddgen connectors

# Install a connector
ddgen install postgres
ddgen install clickhouse
ddgen install all

# Show library version and connector summary
ddgen info

# Show version number
ddgen --version

Schema Comparison

from data_dictionary_builder import SchemaComparator, DDHelper

helper = DDHelper(".")
report = SchemaComparator(
    source_config={"db_type": "postgres", "host": "prod-db", ...},
    destination_config={"db_type": "postgres", "host": "staging-db", ...},
).compare_and_generate_report("public", include_yaml_gaps=True)

json_path = helper.save_report(report)
pdf_path  = helper.compile_pdf(source_json=json_path)
helper.send_report_email(report=report, pdf_path=pdf_path, email_to="team@example.com")

Airflow Integration

DatabaseMetadata serialises to/from plain dicts for XCom:

@task
def extract():
    with MetadataExtractor(**config) as ext:
        return ext.extract_all_schemas(parallel_workers=8).to_dict()

@task
def generate_yaml(db_meta_dict):
    from data_dictionary_builder import DatabaseMetadata, YAMLGenerator
    YAMLGenerator("./models").generate_yaml_files(DatabaseMetadata.from_dict(db_meta_dict))

See tests/airflow_dag_example.py for a complete DAG.


Key Features

  • Parallel extractionThreadPoolExecutor with configurable workers; ClickHouse uses 2 queries and PostgreSQL uses 5 queries per schema regardless of table count
  • Schema filtering — exact, glob, prefix, suffix, contains, regex — mix freely
  • Smart YAML merge — re-running never overwrites descriptions you've written by hand
  • Cross-database comparison — compare any two database types; type aliases normalised before diffing
  • PDF reports — paginated, no row limits, table of contents (requires reportlab)
  • Email delivery — SMTP with env-var credential fallback; PDF attached automatically
  • ExecutionTimer — named task timing with a formatted summary table
  • Server mode — omit database to scan all databases on a MySQL, ClickHouse, or PostgreSQL server

Environment Variables

Set these in a .env file (see tests/.env.example) or in your shell:

# SMTP — used by DDHelper.send_report_email() when no credentials are passed explicitly
SMTP_HOST=smtp.gmail.com
SMTP_PORT=587
SMTP_USER=you@gmail.com
SMTP_PASSWORD=xxxx xxxx xxxx xxxx
EMAIL_TO=recipient@example.com

Documentation

Full user guide, API reference, and troubleshooting: DOCUMENTATION.md


License

MIT — free to use, modify, and distribute in personal and commercial projects.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_dictionary_builder-0.1.0.tar.gz (70.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_dictionary_builder-0.1.0-py3-none-any.whl (64.3 kB view details)

Uploaded Python 3

File details

Details for the file data_dictionary_builder-0.1.0.tar.gz.

File metadata

  • Download URL: data_dictionary_builder-0.1.0.tar.gz
  • Upload date:
  • Size: 70.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for data_dictionary_builder-0.1.0.tar.gz
Algorithm Hash digest
SHA256 62311ba6f5fabf1d20ce858d2c6d3c46f3c3218f777c36797f16317c3e0fcda2
MD5 f2c39e8295111a9e903f79d56eadec47
BLAKE2b-256 999741039134c750e476814e888b659565d47e13c2886cc8e425d118a8658fd5

See more details on using hashes here.

Provenance

The following attestation bundles were made for data_dictionary_builder-0.1.0.tar.gz:

Publisher: publish.yml on GraFreak0/data_dictionary_builder

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file data_dictionary_builder-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for data_dictionary_builder-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 595b85253cc117900921c85cff3914da7e5a5de679727e5926bd5eae61e19123
MD5 a91ae9df66d15c669a4a568af8704cf9
BLAKE2b-256 c3435fdfc13ebca24c25970da8201e576936eb774bd9cb431ef7c042206669ad

See more details on using hashes here.

Provenance

The following attestation bundles were made for data_dictionary_builder-0.1.0-py3-none-any.whl:

Publisher: publish.yml on GraFreak0/data_dictionary_builder

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page