Skip to main content

Python library to extract lineage from Sql Server stored procedures

Project description

sql-server-lineage

I worked many times on sql server data warehouses or or sql server application databases with almost
no documentation and lots of stored procedures transforming data. At some point I wanted a tool that would
analyse all the stored procedures and generate a lineage that I would be able to send to a Data Catalog or
visualise immediately to get an idea of the structure and debug much more easily. So I decided to develop it.

This is a library available in Golang and Python that would enable you to do 2 things:

  • Get the lineage of all the stored procedures in a database as python dictionary.
  • Generate an html file with the lineage visualised (That you can share on Google Pages or Github Pages or wherever you like).

CTEs (deeply nested as well), Temp Tables, Table Variables are all handled gracefully. You'll always see the original source tables.

The resulting lineage structure is the following: sink_table -> stored_procedure (1 or more) -> list_of_sources

Dictionary Example:

{
    'db_name.schema.table_sink': {
        'schema.stored_procedure_1': [
            'schema.table_source_1',
            'schema.table_source_2',
            'schema.table_source_3',
        ],
        'schema.stored_procedure_2': [
            'schema.table_source_a',
            'schema.table_source_b',
            'schema.table_source_c',
        ],
    },
    'db_name.schema.table_sink_2': {
        'schema.stored_procedure_3': [
            'schema.table_source',
        ],
    },
}

Html Example:

Sample Image

Therefore, everything is centered around the sink_table. Because most likely you want to see for each table where does the data come from. If you need it different (for a Data Catalog) you can transform the objects and adapt them to your requirement. In the html you will see the sink table as reference in a different color (green) and you will see it again in the lineage.

I decided to generate html files instead of starting a local server (like DBT does) because I wanted to keep it simple and make it easy to host somewhere as a static website and therefore sharing it with other Data Engineers, Data Analysts, etc.

This project is named sql-server-lineage because I want to keep it specialised only on Sql Server. To generate the lineage you need to develop a custom implementation of a Sql parser. I did it for Sql Server as I've been using it recently but I do not intend to develop parsers for other sql dialects.

DISCLAIMER

Since this library analyses stored procedures, if you transform data externally and then write it to the database there is nothing you can do for the lineage. Also if the stored procedures rely heavily on dynamic sql especially with table names passed as parameters then the library would not be able to determine the table names for the lineage.

Implementation

All the core logic is developed in Go, then using CGO, C bindings and some adapter functions I made it available also as a Python library. The main reason to do that was because Data Engineers work mainly in Python.

Get Started

Installation

pip install sql-server-lineage

Example to generate the lineage directly from sql server

In this example we get the stored procedures definitions from sql server but you can also read the definitions from files.

from sql_server_lineage import get_lineage, generate_html_lineage
from typing import Dict, List
import pyodbc

host: str = "host"
port: str = "port"
driver: str = "driver"
user: str = "user"
password: str = "password"
database: str = "database"
otherparams: str = "otherparams"

db_conn_string: str = f"DRIVER={{{driver}}};SERVER={host};PORT={port};DATABASE={database};UID={user};PWD={password};{otherparams}"
conn: pyodbc.Connection = pyodbc.connect(db_conn_string)

sql_query: str = """
    SELECT OBJECT_DEFINITION(object_id) as sp_definition
    FROM sys.procedures
    WHERE object_id not in (select major_id from sys.extended_properties);
"""

cursor = conn.cursor()
cursor.execute(sql_query)
records: List = cursor.fetchall()
stored_procedures: List[str] = [r.sp_definition for r in records]

# get lineage and then send it to a Data Catalog, or transform and then send.
result_lineage: Dict = get_lineage(stored_procedures)

# generate the html representation of the lineage
# providing the filename is optional
lineage_file: str = "lineage_generated.html"
generate_html_lineage(stored_procedures, lineage_file)

This project operates under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

sql_server_lineage-1.13.0-cp312-cp312-win_amd64.whl (1.5 MB view hashes)

Uploaded CPython 3.12 Windows x86-64

sql_server_lineage-1.13.0-cp312-cp312-win32.whl (1.5 MB view hashes)

Uploaded CPython 3.12 Windows x86

sql_server_lineage-1.13.0-cp312-cp312-musllinux_1_1_x86_64.whl (1.5 MB view hashes)

Uploaded CPython 3.12 musllinux: musl 1.1+ x86-64

sql_server_lineage-1.13.0-cp312-cp312-musllinux_1_1_i686.whl (1.4 MB view hashes)

Uploaded CPython 3.12 musllinux: musl 1.1+ i686

sql_server_lineage-1.13.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.4 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

sql_server_lineage-1.13.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

sql_server_lineage-1.13.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.4 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

sql_server_lineage-1.13.0-cp312-cp312-macosx_11_0_arm64.whl (767.2 kB view hashes)

Uploaded CPython 3.12 macOS 11.0+ ARM64

sql_server_lineage-1.13.0-cp312-cp312-macosx_10_9_x86_64.whl (803.0 kB view hashes)

Uploaded CPython 3.12 macOS 10.9+ x86-64

sql_server_lineage-1.13.0-cp311-cp311-win_amd64.whl (1.5 MB view hashes)

Uploaded CPython 3.11 Windows x86-64

sql_server_lineage-1.13.0-cp311-cp311-win32.whl (1.5 MB view hashes)

Uploaded CPython 3.11 Windows x86

sql_server_lineage-1.13.0-cp311-cp311-musllinux_1_1_x86_64.whl (1.5 MB view hashes)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

sql_server_lineage-1.13.0-cp311-cp311-musllinux_1_1_i686.whl (1.4 MB view hashes)

Uploaded CPython 3.11 musllinux: musl 1.1+ i686

sql_server_lineage-1.13.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.4 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

sql_server_lineage-1.13.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

sql_server_lineage-1.13.0-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.4 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

sql_server_lineage-1.13.0-cp311-cp311-macosx_11_0_arm64.whl (767.2 kB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

sql_server_lineage-1.13.0-cp311-cp311-macosx_10_9_x86_64.whl (803.0 kB view hashes)

Uploaded CPython 3.11 macOS 10.9+ x86-64

sql_server_lineage-1.13.0-cp310-cp310-win_amd64.whl (1.5 MB view hashes)

Uploaded CPython 3.10 Windows x86-64

sql_server_lineage-1.13.0-cp310-cp310-win32.whl (1.5 MB view hashes)

Uploaded CPython 3.10 Windows x86

sql_server_lineage-1.13.0-cp310-cp310-musllinux_1_1_x86_64.whl (1.5 MB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

sql_server_lineage-1.13.0-cp310-cp310-musllinux_1_1_i686.whl (1.4 MB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ i686

sql_server_lineage-1.13.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.4 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

sql_server_lineage-1.13.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

sql_server_lineage-1.13.0-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.4 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

sql_server_lineage-1.13.0-cp310-cp310-macosx_11_0_arm64.whl (767.2 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

sql_server_lineage-1.13.0-cp310-cp310-macosx_10_9_x86_64.whl (803.0 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

sql_server_lineage-1.13.0-cp39-cp39-win_amd64.whl (1.5 MB view hashes)

Uploaded CPython 3.9 Windows x86-64

sql_server_lineage-1.13.0-cp39-cp39-win32.whl (1.5 MB view hashes)

Uploaded CPython 3.9 Windows x86

sql_server_lineage-1.13.0-cp39-cp39-musllinux_1_1_x86_64.whl (1.5 MB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

sql_server_lineage-1.13.0-cp39-cp39-musllinux_1_1_i686.whl (1.4 MB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ i686

sql_server_lineage-1.13.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.4 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

sql_server_lineage-1.13.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

sql_server_lineage-1.13.0-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.4 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

sql_server_lineage-1.13.0-cp39-cp39-macosx_11_0_arm64.whl (767.2 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

sql_server_lineage-1.13.0-cp39-cp39-macosx_10_9_x86_64.whl (803.0 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

sql_server_lineage-1.13.0-cp38-cp38-win_amd64.whl (1.5 MB view hashes)

Uploaded CPython 3.8 Windows x86-64

sql_server_lineage-1.13.0-cp38-cp38-win32.whl (1.5 MB view hashes)

Uploaded CPython 3.8 Windows x86

sql_server_lineage-1.13.0-cp38-cp38-musllinux_1_1_x86_64.whl (1.5 MB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

sql_server_lineage-1.13.0-cp38-cp38-musllinux_1_1_i686.whl (1.4 MB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ i686

sql_server_lineage-1.13.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.4 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

sql_server_lineage-1.13.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

sql_server_lineage-1.13.0-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.4 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

sql_server_lineage-1.13.0-cp38-cp38-macosx_11_0_arm64.whl (767.2 kB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

sql_server_lineage-1.13.0-cp38-cp38-macosx_10_9_x86_64.whl (803.0 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page