Collection of custom DataHub transformers for metadata enhancement
Project description
DataHub Custom Transformers
A collection of custom DataHub transformers for various metadata enhancement tasks.
Features
- 🏗️ Modular Design: Easy to add new transformers
- 🔧 Production Ready: Tested and documented transformers
- 🔌 Auto-Discovery: Transformers are automatically registered with DataHub
Installation
uv add datahub-custom-transformers
Available Transformers
Domain Structured Properties Transformer
Adds domain-type structured properties to all datasets in an ingestion.
Use Case: Organizational data classification where all datasets from a source belong to the same environment, team, or department.
transformers:
- type: "simple_add_dataset_domain_structured_properties"
config:
properties:
environment: "production_environment"
team: "data_engineering_team"
department: "engineering_department"
Quick Start
1. Prerequisites
Create structured properties in DataHub:
# structured_properties.yaml
- id: department
type: urn
description: "Data environment assignment"
display_name: "Environment"
entity_types: [dataset]
cardinality: SINGLE
type_qualifier:
allowed_types: ["urn:li:entityType:datahub.domain"]
Create domain entities:
production_environmentdata_engineering_team
2. Use in Ingestion Recipe
source:
type: postgres
config:
host_port: "localhost:5432"
database: "analytics_db"
transformers:
- type: "simple_add_dataset_domain_structured_properties"
config:
properties:
environment: "production_environment"
team: "data_engineering_team"
sink:
type: datahub-rest
config:
server: "http://localhost:8080"
3. Run Ingestion
datahub ingest -c config.yaml
Result
All datasets will have structured properties:
{
"structuredProperties": {
"properties": [
{
"propertyUrn": "urn:li:structuredProperty:environment",
"values": ["urn:li:domain:production_environment"]
},
{
"propertyUrn": "urn:li:structuredProperty:team",
"values": ["urn:li:domain:data_engineering_team"]
}
]
}
}
Supported DataHub Sources
Works with all DataHub sources:
- BigQuery, Snowflake, PostgreSQL, MySQL, Redshift
- dbt, Airflow, Kafka, S3
- And many more...
Requirements
- Python 3.11+
- acryl-datahub >= 0.12.0
Contributing
- Fork the repository
- Create a feature branch
- Add your transformer with tests
- Submit a pull request
Support
- 📖 Documentation
- 🐛 Issues
- 💬 Discussions
License
MIT License - see LICENSE file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datahub_custom_transformers-0.2.1.tar.gz.
File metadata
- Download URL: datahub_custom_transformers-0.2.1.tar.gz
- Upload date:
- Size: 6.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e93e6055ba4f99f24294c18e510dacc65cb6cb214f792614e8c201d8544909e
|
|
| MD5 |
65b5eff2652f46ffe3795f101c427a82
|
|
| BLAKE2b-256 |
8e742fe67d5ff94b68bb4ce08b37fd445f64f33ddd55a066eda3c12e1ce66fc7
|
File details
Details for the file datahub_custom_transformers-0.2.1-py3-none-any.whl.
File metadata
- Download URL: datahub_custom_transformers-0.2.1-py3-none-any.whl
- Upload date:
- Size: 6.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52c8f4d38998314e5eaebe78f9806e84744a01960d8d9378bf693fecfe44518e
|
|
| MD5 |
f41a49054425b93d0399b77b549fccd6
|
|
| BLAKE2b-256 |
38219a6444c6618e8113f595fa2ebd9df60d0b60d56f891c013c8e261bc1409c
|