Data.Rentgen REST API + Kafka consumer
Project description
What is Data.Rentgen?
Data.Rentgen is a Data Motion Lineage service, compatible with OpenLineage specification.
Currently we support consuming lineage from:
Apache Spark
Apache Airflow
Apache Hive
Apache Flink
dbt
Note: service is under active development, so it doesn’t have stable API for now.
Goals
Collect lineage events produced by OpenLineage clients & integrations.
Store operation-grained events for better detalization.
Provide API for fetching both job/run ↔ dataset lineage and dataset ↔ dataset lineage.
Features
Support consuming large amounts of lineage events, use Apache Kafka as event buffer.
Store data in tables partitioned by event timestamp, to speed up lineage graph resolution.
Lineage graph is build with user-specified time boundaries.
Lineage graph can be build with different granularity. e.g. merge all individual Spark commands into Spark applicationId or Spark applicationName.
Column-level lineage support.
Authentication support.
Non-goals
This is not a Data Catalog. DataRentgen doesn’t track dataset schema change, owner and so on. Use Datahub or OpenMetadata instead.
Static Data Lineage like view → table is not supported.
Limitations
OpenLineage have integrations with Trino, Debezium and some other lineage sources. DataRentgen support may be added later.
DataRentgen parses only limited set of OpenLineage facets, and doesn’t store custom facets. This can be changed in future.
Documentation
Screenshots
Lineage graph
Dataset-level lineage graph
Dataset column-level lineage graph
Job-level lineage graph
Run-level lineage graph
Datasets
Runs
Spark application
Spark run
Spark command
Hive query
Airflow DagRun
Airflow TaskInstance
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file data_rentgen-0.4.3.tar.gz.
File metadata
- Download URL: data_rentgen-0.4.3.tar.gz
- Upload date:
- Size: 149.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f70fa1929ed1e36af57563075e5b6aa7ab9bd986cf7bc80d58deebe59efb0db0
|
|
| MD5 |
c11097c72a0b6f23f0da81fbba23b58b
|
|
| BLAKE2b-256 |
2b1ace34ee12bd0ffe87353c89956b08de4315a8ca94d772459aa1162a518185
|
Provenance
The following attestation bundles were made for data_rentgen-0.4.3.tar.gz:
Publisher:
release.yml on MobileTeleSystems/data-rentgen
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
data_rentgen-0.4.3.tar.gz -
Subject digest:
f70fa1929ed1e36af57563075e5b6aa7ab9bd986cf7bc80d58deebe59efb0db0 - Sigstore transparency entry: 713990815
- Sigstore integration time:
-
Permalink:
MobileTeleSystems/data-rentgen@04d73bb580806cc82c18792d2432581449872d71 -
Branch / Tag:
refs/tags/0.4.3 - Owner: https://github.com/MobileTeleSystems
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@04d73bb580806cc82c18792d2432581449872d71 -
Trigger Event:
push
-
Statement type:
File details
Details for the file data_rentgen-0.4.3-py3-none-any.whl.
File metadata
- Download URL: data_rentgen-0.4.3-py3-none-any.whl
- Upload date:
- Size: 288.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e70d64cfdb67f0cc4293df93b225be6b99b859a3aa0f1dab15218cd96c3830fd
|
|
| MD5 |
8837687bf6cc1e5d870915c688a3d9b6
|
|
| BLAKE2b-256 |
0f6d65af555cc50f5b74dd31a379cff774ab0a4bd0d32ff63643053b7fd474cf
|
Provenance
The following attestation bundles were made for data_rentgen-0.4.3-py3-none-any.whl:
Publisher:
release.yml on MobileTeleSystems/data-rentgen
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
data_rentgen-0.4.3-py3-none-any.whl -
Subject digest:
e70d64cfdb67f0cc4293df93b225be6b99b859a3aa0f1dab15218cd96c3830fd - Sigstore transparency entry: 713990817
- Sigstore integration time:
-
Permalink:
MobileTeleSystems/data-rentgen@04d73bb580806cc82c18792d2432581449872d71 -
Branch / Tag:
refs/tags/0.4.3 - Owner: https://github.com/MobileTeleSystems
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@04d73bb580806cc82c18792d2432581449872d71 -
Trigger Event:
push
-
Statement type: