Skip to main content

Lineagentic-flow is agentic ai approach for building data lineage across diverse data processing scripts including python, sql, java, airflow, spark, etc.

Project description

Lineagentic Logo

Lineagentic-flow

Lineagentic-flow is an agentic ai solution for building end-to-end data lineage across diverse types of data processing scripts across different platforms. It is designed to be modular and customizable, and can be extended to support new data processing script types. In a nutshell this is what it does:

┌─────────────┐    ┌───────────────────────────────┐    ┌────────────---───┐
│ source-code │───▶│   lineagentic-flow-algorithm  │───▶│  lineage output  │
│             │    │                               │    │                  │
└─────────────┘    └───────────────────────────────┘    └──────────────---─┘

Features

  • Plugin based design pattern, simple to extend and customize.
  • Command line interface for quick analysis.
  • Support for multiple data processing script types (SQL, Python, Airflow Spark, etc.)
  • Simple demo server to run locally and in huggingface spaces.

Quick Start

Installation

Install the package from PyPI:

pip install lineagentic-flow

Basic Usage

import asyncio
from lf_algorithm.framework_agent import FrameworkAgent
import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

async def main():
    # Create an agent for SQL lineage extraction
    agent = FrameworkAgent(
        agent_name="sql-lineage-agent",
        model_name="gpt-4o-mini",
        source_code="SELECT id, name FROM users WHERE active = true"
    )
    
    # Run the agent to extract lineage
    result = await agent.run_agent()
    print(result)

# Run the example
asyncio.run(main())

Supported Agents

Following table shows the current development agents in Lineagentic-flow algorithm:

Agent Name Done Under Development In Backlog Comment
python-lineage_agent
airflow_lineage_agent
java_lineage_agent
spark_lineage_agent
sql_lineage_agent
flink_lineage_agent
beam_lineage_agent
shell_lineage_agent
scala_lineage_agent
dbt_lineage_agent

Environment Variables

Set your API keys:

export OPENAI_API_KEY="your-openai-api-key"
export HF_TOKEN="your-huggingface-token"  # Optional

What are the components of Lineagentic-flow?

  • Algorithm module: This is the brain of the Lineagentic-flow. It contains agents, which are implemented as plugins and acting as chain of thought process to extract lineage from different types of data processing scripts. The module is built using a plugin-based design pattern, allowing you to easily develop and integrate your own custom agents.

  • CLI module: is for command line around algorithm API and connect to unified service layer

  • Demo module: is for teams who want to demo Lineagentic-flow in fast and simple way deployable into huggingface spaces.

Command Line Interface (CLI)

Lineagentic-flow provides a powerful CLI tool for quick analysis:

# Basic SQL query analysis
lineagentic analyze --agent-name sql-lineage-agent --query "SELECT user_id, name FROM users WHERE active = true" --verbose

# Analyze with lineage configuration
lineagentic analyze --agent-name python-lineage-agent --query-file "my_script.py" --verbose

for more details see CLI documentation.

environment variables

  • HF_TOKEN (HUGGINGFACE_TOKEN)
  • OPENAI_API_KEY

Architecture

The following figure illustrates the architecture behind the Lineagentic-flow, which is essentially a multi-layer architecture of backend and agentic AI algorithm that leverages a chain-of-thought process to construct lineage across various script types.

Architecture Diagram

Mathematic behind algorithm

Following shows mathematic behind each layer of algorithm.

Agent framework

The agent framework dose IO operations ,memory management, and prompt engineering according to the script type (T) and its content (C).

$$ P := f(T, C) $$

Runtime orchestration agent

The runtime orchestration agent orchestrates the execution of the required agents provided by the agent framework (P) by selecting the appropriate agent (A) and its corresponding task (T).

$$ G=h([{(A_1, T_1), (A_2, T_2), (A_3, T_3), (A_4, T_4)}],P) $$

Syntax Analysis Agent

Syntax Analysis agent, analyzes the syntactic structure of the raw script to identify subqueries and nested structures and decompose the script into multiple subscripts.

$$ {sa1,⋯,san}:=h([A_1,T_1],P) $$

Field Derivation Agent

The Field Derivation agent processes each subscript from syntax analysis agent to derive field-level mapping relationships and processing logic.

$$ {fd1,⋯,fdn}:=h([A_2,T_2],{sa1,⋯,san}) $$

Operation Tracing Agent

The Operation Tracing agent analyzes the complex conditions within each subscript identified in syntax analysis agent including filter conditions, join conditions, grouping conditions, and sorting conditions.

$$ {ot1,⋯,otn}:=h([A_3,T_3],{sa1,⋯,san}) $$

Event Composer Agent

The Event Composer agent consolidates the results from the syntax analysis agent, the field derivation agent and the operation tracing agent to generate the final lineage result.

$$ {A}:=h([A_4,T_4],{sa1,⋯,san},{fd1,⋯,fdn},{ot1,⋯,otn}) $$

Activation and Deployment

To simplify the usage of Lineagentic-flow, a Makefile has been created to manage various activation and deployment tasks. You can explore the available targets directly within the Makefile. Here you can find different strategies but for more details look into Makefile.

1- to start demo server:

make start-demo-server

2- to do all tests:

make test

3- to build package:

make build-package

4- to clean all stack:

make clean-all-stack

5- In order to deploy Lineagentic-flow to Hugging Face Spaces, run the following command ( you need to have huggingface account and put secret keys there if you are going to use paid models):

make gradio-deploy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lineagentic_flow-1.0.2.tar.gz (341.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lineagentic_flow-1.0.2-py3-none-any.whl (74.4 kB view details)

Uploaded Python 3

File details

Details for the file lineagentic_flow-1.0.2.tar.gz.

File metadata

  • Download URL: lineagentic_flow-1.0.2.tar.gz
  • Upload date:
  • Size: 341.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for lineagentic_flow-1.0.2.tar.gz
Algorithm Hash digest
SHA256 a65c91a8396de13f4647dde5aa03ad3541596632af1715d3f14b1ff9d44f7bec
MD5 af1c5dbb7c601ed36c1bb241cec9d933
BLAKE2b-256 2aff7fd96ca75276da1f50f156d37a33bde4792f2f2b0edf4bf5b50ccc31e612

See more details on using hashes here.

File details

Details for the file lineagentic_flow-1.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for lineagentic_flow-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c7805aa2c5c04bd0850175e03fafa266b470c3eba450a7f71ddd1f9f0553f4eb
MD5 29a7b2553f9262d8dd7e126afd0b3a8d
BLAKE2b-256 53c76c755b6976a400754fd2bb6533d5f0b0c28e24952288fb15f91951b127f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page