Skip to main content

Enterprise-grade Python data lineage tracking library with automatic pandas integration, perfect memory optimization, and comprehensive visualization capabilities.

Project description

🚀 DataLineagePy 3.0

Enterprise-Grade Python Data Lineage Tracking

Python 3.8+ License: MIT Production Ready Performance Score Enterprise Grade


DataLineagePy Banner

Beautiful, Powerful, and Effortless Data Lineage for Python

Track, visualize, and govern your data pipelines with zero friction.


🌟 Why DataLineagePy?

  • Automatic, column-level lineage tracking for all pandas DataFrames
  • Enterprise performance: memory-optimized, scalable, and production-ready
  • Stunning visualizations: interactive dashboards, HTML, PNG, SVG, and more
  • Plug-and-play connectors: MySQL, PostgreSQL, SQLite, and custom sources
  • Security & compliance: RBAC, AES-256 encryption, audit trails
  • Real-time collaboration: WebSocket server/client for team workflows
  • ML/AI pipeline tracking: Full auditability for machine learning steps
  • Cloud-native deployment: Docker, Kubernetes, Helm, Terraform

📋 Table of Contents


🚀 Quick Start

pip install datalineagepy
from datalineagepy import LineageTracker, LineageDataFrame
import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
tracker = LineageTracker(name="demo")
ldf = LineageDataFrame(df, name="my_df", tracker=tracker)
ldf2 = ldf.filter(ldf._df['a'] > 1)
ldf3 = ldf2.assign(c=ldf2._df['a'] + ldf2._df['b'])
tracker.visualize()  # Interactive HTML dashboard
tracker.export_lineage("lineage.json")

💾 Installation

  • PyPI: pip install datalineagepy
  • With visualization: pip install datalineagepy[viz]
  • All features: pip install datalineagepy[all]
  • Conda: conda install -c conda-forge datalineagepy (coming soon)
  • Docker: docker pull datalineagepy/datalineagepy:latest

See Installation Guide for advanced and enterprise setup.


📚 Core Features

  • Automatic lineage tracking for pandas DataFrames
  • Data validation: completeness, uniqueness, range, custom rules
  • Profiling & analytics: quality scoring, missing data, correlations
  • Visualization: HTML, PNG, SVG, interactive dashboards
  • Performance monitoring: execution time, memory, alerts
  • Security: RBAC, AES-256 encryption, audit trail
  • Custom connectors: SDK for any data source
  • Versioning: save, diff, rollback lineage graphs
  • Collaboration: real-time editing/viewing
  • ML/AI pipeline tracking: AutoMLTracker for full auditability

🔧 Usage Guide

1. Lineage Tracking

from datalineagepy import LineageTracker, LineageDataFrame
import pandas as pd
tracker = LineageTracker(name="my_pipeline")
df = pd.DataFrame({'x': [1,2,3], 'y': [4,5,6]})
ldf = LineageDataFrame(df, name="input", tracker=tracker)
ldf2 = ldf.assign(z=ldf._df['x'] + ldf._df['y'])
print(tracker.export_graph())

2. Data Validation

from datalineagepy.core.validation import DataValidator
validator = DataValidator(tracker)
rules = {'completeness': {'threshold': 0.9}, 'uniqueness': {'columns': ['x']}}
results = validator.validate_dataframe(ldf, rules)
print(results)

3. Profiling & Analytics

from datalineagepy.core.analytics import DataProfiler
profiler = DataProfiler(tracker)
profile = profiler.profile_dataset(ldf, include_correlations=True)
print(profile)

4. Visualization & Reporting

from datalineagepy.visualization.graph_visualizer import GraphVisualizer
visualizer = GraphVisualizer(tracker)
visualizer.generate_html("lineage.html")
visualizer.generate_png("lineage.png")

5. Performance Monitoring

from datalineagepy.core.performance import PerformanceMonitor
monitor = PerformanceMonitor(tracker)
monitor.start_monitoring()
_ = ldf._df.sum()
monitor.stop_monitoring()
print(monitor.get_performance_summary())

6. Security & Compliance

from datalineagepy.security.rbac import RBACManager
rbac = RBACManager()
rbac.add_role('admin', ['read', 'write'])
rbac.add_user('alice', ['admin'])
print(rbac.check_access('alice', 'write'))

from datalineagepy.security.encryption.data_encryption import EncryptionManager
import os
os.environ['MASTER_ENCRYPTION_KEY'] = 'supersecretkey1234567890123456'
enc_mgr = EncryptionManager()
secret = 'Sensitive Data'
encrypted = enc_mgr.encrypt_sensitive_data(secret)
decrypted = enc_mgr.decrypt_sensitive_data(encrypted)
print(decrypted)

7. Database Connectors

from datalineagepy.connectors.database.mysql_connector import MySQLConnector
from datalineagepy.core import LineageTracker
db_config = {'host': 'localhost', 'user': 'root', 'password': 'password', 'database': 'test_db'}
tracker = LineageTracker()
conn = MySQLConnector(**db_config, lineage_tracker=tracker)
conn.execute_query('SELECT * FROM test_table')
conn.close()

8. ML/AI Pipeline Tracking

from datalineagepy import AutoMLTracker
tracker = AutoMLTracker(name='ml_pipeline')
tracker.log_step('fit', model='LogisticRegression', params={'solver': 'lbfgs'})
tracker.log_step('predict', model='LogisticRegression')
print(tracker.export_ai_ready_format())

📊 Visualization & Reporting

  • Interactive HTML dashboards: tracker.visualize()
  • Export formats: JSON, DOT, PNG, SVG, Excel, CSV
  • Custom visualizations: Use GraphVisualizer for advanced needs

🗄️ Database Connectors

  • MySQL, PostgreSQL, SQLite: Full lineage tracking for every query
  • Custom connectors: Build your own with the SDK
  • See Database Connectors Guide

⚡ Performance Monitoring

  • Track execution time, memory, and operation stats
  • Alerting: Slack, Email, custom hooks
  • Production monitoring: Integrate with Prometheus, Grafana, etc.

🔒 Security & Compliance

  • RBAC: Role-based access control for users and actions
  • AES-256 encryption: At-rest and in-transit data protection
  • Audit trail: Full operation history for compliance

🤖 ML/AI Pipeline Tracking

  • AutoMLTracker: Log, audit, and export every ML pipeline step
  • Explainability: Export pipeline steps for downstream analysis

☁️ Enterprise Deployment

  • Docker, Kubernetes, Helm, Terraform: Cloud-native ready
  • Production scripts: See deploy/ for examples

💡 Use Cases

  • Data science: Reproducibility, experiment tracking, Jupyter integration
  • Enterprise ETL: Production pipelines, data quality, compliance
  • Data governance: Impact analysis, documentation, audit trails
  • ML/AI: Pipeline explainability, model audit, feature tracking

📖 Documentation


🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.


📄 License

MIT License. See LICENSE for details.


DataLineagePy 3.0 — The new standard for Python data lineage
Beautiful. Powerful. Effortless.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datalineagepy-3.0.3.tar.gz (584.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datalineagepy-3.0.3-py3-none-any.whl (389.3 kB view details)

Uploaded Python 3

File details

Details for the file datalineagepy-3.0.3.tar.gz.

File metadata

  • Download URL: datalineagepy-3.0.3.tar.gz
  • Upload date:
  • Size: 584.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for datalineagepy-3.0.3.tar.gz
Algorithm Hash digest
SHA256 ac6488122b6a4f0b503d8a1e605d7b8dbce70b160fc3d22049e99f6af5106d94
MD5 31986696b6345cfe3fbc26c6724c426e
BLAKE2b-256 95ee07b46da8838d550e8a31c60090d42d92da271c12d838827dd372a89838d2

See more details on using hashes here.

File details

Details for the file datalineagepy-3.0.3-py3-none-any.whl.

File metadata

  • Download URL: datalineagepy-3.0.3-py3-none-any.whl
  • Upload date:
  • Size: 389.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for datalineagepy-3.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ec0eac82c32bb78de59ffaf37bd08c707623c370ffe9c4ee2aad19658f8c99e4
MD5 d1aea352737bc9641cc8ed1daed66995
BLAKE2b-256 9a33a774c64a374918b5d9360aac08bf67a0198dd4516b4ed7e02d2ec329ea76

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page