Skip to main content

Collection of custom DataHub transformers for metadata enhancement

Project description

DataHub Custom Transformers

PyPI version Python Support License: MIT

A collection of custom DataHub transformers for various metadata enhancement tasks.

Features

  • 🏗️ Modular Design: Easy to add new transformers
  • 🔧 Production Ready: Tested and documented transformers
  • 📦 Easy Installation: Simple pip install
  • 🔌 Auto-Discovery: Transformers are automatically registered with DataHub

Installation

uv add datahub-custom-transformers

Available Transformers

Domain Structured Properties Transformer

Adds domain-type structured properties to all datasets in an ingestion.

Use Case: Organizational data classification where all datasets from a source belong to the same environment, team, or department.

transformers:
  - type: "simple_add_dataset_domain_structured_properties"
    config:
      properties:
        environment: "production_environment"
        team: "data_engineering_team"
        department: "engineering_department"

Quick Start

1. Prerequisites

Create structured properties in DataHub:

# structured_properties.yaml
- id: department
  type: urn
  description: "Data environment assignment"
  display_name: "Environment"
  entity_types: [dataset]
  cardinality: SINGLE
  type_qualifier:
    allowed_types: ["urn:li:entityType:datahub.domain"]

Create domain entities:

  • production_environment
  • data_engineering_team

2. Use in Ingestion Recipe

source:
  type: postgres
  config:
    host_port: "localhost:5432"
    database: "analytics_db"

transformers:
  - type: "simple_add_dataset_domain_structured_properties"
    config:
      properties:
        environment: "production_environment"
        team: "data_engineering_team"

sink:
  type: datahub-rest
  config:
    server: "http://localhost:8080"

3. Run Ingestion

datahub ingest -c config.yaml

Result

All datasets will have structured properties:

{
  "structuredProperties": {
    "properties": [
      {
        "propertyUrn": "urn:li:structuredProperty:environment",
        "values": ["urn:li:domain:production_environment"]
      },
      {
        "propertyUrn": "urn:li:structuredProperty:team",
        "values": ["urn:li:domain:data_engineering_team"]
      }
    ]
  }
}

Supported DataHub Sources

Works with all DataHub sources:

  • BigQuery, Snowflake, PostgreSQL, MySQL, Redshift
  • dbt, Airflow, Kafka, S3
  • And many more...

Requirements

  • Python 3.11+
  • acryl-datahub >= 0.12.0

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add your transformer with tests
  4. Submit a pull request

Support

License

MIT License - see LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datahub_custom_transformers-0.2.0.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datahub_custom_transformers-0.2.0-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file datahub_custom_transformers-0.2.0.tar.gz.

File metadata

File hashes

Hashes for datahub_custom_transformers-0.2.0.tar.gz
Algorithm Hash digest
SHA256 d29cc29386ad9f23e2c06bbbed8753a94596b20a26c94e7b1a466fed559474dc
MD5 fb57cec68a274e98f6961ed9d2c26880
BLAKE2b-256 2537d08ee79576aa0a302d699500caa6bdd6b20e6596f0459c56f440b8fb51c8

See more details on using hashes here.

File details

Details for the file datahub_custom_transformers-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for datahub_custom_transformers-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bac6c68fd9b3316690dcdbb4b0fa7b573ce2850eda680e9b3145dc18b0f39d09
MD5 d088c1752146ad1d8ce97c30649c4385
BLAKE2b-256 a5c1daf08a4710c03a7660c3957c94156de991f23cf262bc0284e85a620ca780

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page