Skip to main content

LD-RAG: GPT-driven ontology reasoning and retrieval

Project description

LDrag - Ontology-Based Retrieval Augmented Generation

LDrag is a Python library for creating, managing, and querying knowledge graphs through an ontology-based approach combined with Large Language Models for Retrieval Augmented Generation (RAG).

Disclaimer

This project is a work in progress and is not yet ready for production use. The codebase is under active development and subject to change. The Project is a Proof of Concept and is not intended for any specific use case. It is developed for the Carl-Zeiss Research Project KIDZ.

Overview

This library provides tools to represent domain knowledge as an ontology, with machine learning models, datasets, and other entities as nodes within this knowledge graph. It enables:

  • Converting trained scikit-learn models into ontology structures
  • Adding dataset metadata from pandas DataFrames to the ontology
  • Calculating and incorporating SHAP values for model explainability
  • Converting between JSON and OWL ontology formats
  • Visualizing ontology structures as interactive graphs
  • Querying the ontology using natural language with GPT-powered retrieval

Features

Ontology Management

  • Create and manipulate ontological structures with nodes, relationships, and classes
  • Deserialize from and serialize to JSON
  • Convert between JSON and OWL formats
  • Generate graphical representations (static and interactive)

Model Integration

  • Convert sklearn models into ontology nodes
  • Store model performance metrics (accuracy, precision, recall, F1, ROC AUC)
  • Capture model weights and parameters
  • Link models to their training datasets and tasks

Dataset Handling

  • Store dataset metadata including statistics for numerical attributes
  • Link datasets to their attributes and models

Explainability

  • Calculate and store SHAP values for model explainability
  • Link explanations to models and features

Retrieval Augmented Generation

  • Query the ontology using natural language
  • Traverse the knowledge graph to find relevant information
  • Visualize query exploration paths
  • Use GPT to interpret and reason over ontology data

Installation

# Installation instructions to be added

Dependencies

  • networkx
  • matplotlib
  • pyvis
  • rdflib
  • numpy
  • pandas
  • scikit-learn
  • shap
  • openai

Usage Examples

Loading an Ontology

from ldrag.ontology import Ontology

# Load from JSON file
ontology = Ontology()
ontology.deserialize("ontology_base.json")

Adding a Model to the Ontology

from ldrag.ontology_io import sklearn_model_to_ontology
import sklearn

# Train a model
model = sklearn.ensemble.RandomForestClassifier()
model.fit(X_train, y_train)

# Add to ontology
sklearn_model_to_ontology(
    model=model,
    model_id="forest_model_1",
    dataset_id="my_dataset",
    task_id="classification_task",
    X_train=X_train,
    X_test=X_test,
    y_test=y_test,
    output_file="ontology.json"
)

Adding Dataset Metadata

from ldrag.ontology_io import add_dataset_metadata_from_dataframe
import pandas as pd

# Load dataset
df = pd.read_csv("data.csv")

# Add to ontology
add_dataset_metadata_from_dataframe(
    dataset_id="my_dataset",
    df=df,
    domain="finance",
    location="database",
    date="2023-01-01",
    models=["forest_model_1"],
    output_file="ontology.json"
)

Visualizing the Ontology

from ldrag.ontology import Ontology

ontology = Ontology()
ontology.deserialize("ontology.json")

# Create dynamic HTML visualization
ontology.create_dynamic_instance_graph("my_graph")

Querying with Natural Language

from ldrag.ontology import Ontology
from ldrag.retriever import information_retriever

ontology = Ontology()
ontology.deserialize("ontology.json")

# Query the ontology
result = information_retriever(
    ontology=ontology,
    user_query="How many rows does the dataset from September have?"
)

Project Structure

  • ontology.py - Core ontology classes and data structures
  • ontology_io.py - Input/output operations for the ontology including ML model integration
  • retriever.py - Retrieval Augmented Generation functionality
  • gptconnector.py - OpenAI API connector for GPT model integration
  • config.py - Configuration settings

License

[Add license information]

Contributing

[Add contribution guidelines]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ldrag-0.2.0.tar.gz (20.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ldrag-0.2.0-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file ldrag-0.2.0.tar.gz.

File metadata

  • Download URL: ldrag-0.2.0.tar.gz
  • Upload date:
  • Size: 20.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for ldrag-0.2.0.tar.gz
Algorithm Hash digest
SHA256 4d3e1a658e48bc4ea8da68cfc5e1148dfd89cbeb36921ecabb81aca875dd747c
MD5 ffbac16208f1f4a5758e49e36bd8074f
BLAKE2b-256 aa2f79d622ed1c620762928b114955492406833779ee0bb7e06805b83e7c075f

See more details on using hashes here.

File details

Details for the file ldrag-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: ldrag-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 20.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for ldrag-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 75c6313a52c33bd8f0a2c7eee8572857a9af7136fafe957123c517b4dd163c82
MD5 7839e6fe318c1f6d2e38c78007ced3e4
BLAKE2b-256 9dc519d48f873d741a4271d4e4f3d5829246fb10d7223188d39bce4671fb7ed5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page