Skip to main content

Data.Rentgen REST API + Kafka consumer

Project description

Data.Rentgen logo

Repo Status Docker image PyPI PyPI License PyPI Python Version Documentation Build Status Coverage pre-commit.ci

What is Data.Rentgen?

Data.Rentgen is a Data Motion Lineage service, compatible with OpenLineage specification.

Currently we support consuming lineage from:

  • Apache Spark

  • Apache Airflow

  • Apache Hive

  • Apache Flink

  • dbt

  • SyncMaster (proprietary integration, part of MWS Data Bridge)

  • StarRocks (proprietary integration, part of MWS Data Engine)

Note: service is under active development, so API can be unstable.

Goals

  • Collect lineage events produced by OpenLineage clients & integrations.

  • Store operation-grained events for better detalization.

  • Provide API for fetching both job/run ↔ dataset lineage and dataset ↔ dataset lineage.

Features

  • Support consuming large amounts of lineage events, use Apache Kafka as event buffer.

  • Store data in tables partitioned by event timestamp, to speed up lineage graph resolution.

  • Lineage graph is build with user-specified time boundaries.

  • Lineage graph can be build with different granularity. e.g. merge all individual Spark commands into Spark applicationId or Spark applicationName.

  • Column-level lineage support.

  • Authentication support.

Non-goals

  • This is not a Data Catalog. DataRentgen doesn’t track dataset schema change, owner and so on. Use Datahub or OpenMetadata instead.

  • Static Data Lineage like view → table is not supported.

Limitations

  • OpenLineage have integrations with Trino, Debezium and some other lineage sources. DataRentgen support may be added later.

  • DataRentgen parses only limited set of OpenLineage facets, and doesn’t store custom facets. This can be changed in future.

Documentation

See https://data-rentgen.readthedocs.io/

Screenshots

Lineage graph

Dataset-level lineage graph

Dataset-level lineage graph

Dataset column-level lineage graph

Dataset column-level lineage graph

Job-level lineage graph

Job-level lineage graph

Run-level lineage graph

Job-level lineage graph

Hierarchy graph

Job hierarchy

Datasets

Datasets list

Runs

Runs list

Spark application

Spark application details

Spark run

Spark run details

Spark command

Spark command details

Hive query

Hive query details

Airflow DagRun

Airflow DagRun details

Airflow TaskInstance

Airflow TaskInstance details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_rentgen-0.5.1.tar.gz (164.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_rentgen-0.5.1-py3-none-any.whl (318.6 kB view details)

Uploaded Python 3

File details

Details for the file data_rentgen-0.5.1.tar.gz.

File metadata

  • Download URL: data_rentgen-0.5.1.tar.gz
  • Upload date:
  • Size: 164.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for data_rentgen-0.5.1.tar.gz
Algorithm Hash digest
SHA256 45c03c5fb2c73263b4dd5b856d73dd8a9adce529d8fb22d630f10dde56047fc1
MD5 ecf8439b56992972dd6f6685ae09f604
BLAKE2b-256 69d3f3e45bd92d8e61b36a2825b74283edfb95f1b4bd85e5f48d6eda47ce72e9

See more details on using hashes here.

Provenance

The following attestation bundles were made for data_rentgen-0.5.1.tar.gz:

Publisher: release.yml on MTSWebServices/data-rentgen

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file data_rentgen-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: data_rentgen-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 318.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for data_rentgen-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5316935b7ab8d80f583f030c2e4bb30433619ac2587cf68bd6cea9ab34603196
MD5 b68a5810198c0578e2863a37756f22b0
BLAKE2b-256 59dc4417d4a713eaa6e5f39abcfe341bb45204125eee3789760a01263ae21e5a

See more details on using hashes here.

Provenance

The following attestation bundles were made for data_rentgen-0.5.1-py3-none-any.whl:

Publisher: release.yml on MTSWebServices/data-rentgen

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page