LD-RAG: GPT-driven ontology reasoning and retrieval
Project description
LDrag - Ontology-Based Retrieval Augmented Generation
LDrag is a Python library for creating, managing, and querying knowledge graphs through an ontology-based approach combined with Large Language Models for Retrieval Augmented Generation (RAG).
Disclaimer
This project is a work in progress and is not yet ready for production use. The codebase is under active development and subject to change. The Project is a Proof of Concept and is not intended for any specific use case. It is developed for the Carl-Zeiss Research Project KIDZ.
Overview
This library provides tools to represent domain knowledge as an ontology, with machine learning models, datasets, and other entities as nodes within this knowledge graph. It enables:
- Converting trained scikit-learn models into ontology structures
- Adding dataset metadata from pandas DataFrames to the ontology
- Calculating and incorporating SHAP values for model explainability
- Converting between JSON and OWL ontology formats
- Visualizing ontology structures as interactive graphs
- Querying the ontology using natural language with GPT-powered retrieval
Features
Ontology Management
- Create and manipulate ontological structures with nodes, relationships, and classes
- Deserialize from and serialize to JSON
- Convert between JSON and OWL formats
- Generate graphical representations (static and interactive)
Model Integration
- Convert sklearn models into ontology nodes
- Store model performance metrics (accuracy, precision, recall, F1, ROC AUC)
- Capture model weights and parameters
- Link models to their training datasets and tasks
Dataset Handling
- Store dataset metadata including statistics for numerical attributes
- Link datasets to their attributes and models
Explainability
- Calculate and store SHAP values for model explainability
- Link explanations to models and features
Retrieval Augmented Generation
- Query the ontology using natural language
- Traverse the knowledge graph to find relevant information
- Visualize query exploration paths
- Use GPT to interpret and reason over ontology data
Installation
# Installation instructions to be added
Dependencies
- networkx
- matplotlib
- pyvis
- rdflib
- numpy
- pandas
- scikit-learn
- shap
- openai
Usage Examples
Loading an Ontology
from ldrag.ontology import Ontology
# Load from JSON file
ontology = Ontology()
ontology.deserialize("ontology_base.json")
Adding a Model to the Ontology
from ldrag.ontology_io import sklearn_model_to_ontology
import sklearn
# Train a model
model = sklearn.ensemble.RandomForestClassifier()
model.fit(X_train, y_train)
# Add to ontology
sklearn_model_to_ontology(
model=model,
model_id="forest_model_1",
dataset_id="my_dataset",
task_id="classification_task",
X_train=X_train,
X_test=X_test,
y_test=y_test,
output_file="ontology.json"
)
Adding Dataset Metadata
from ldrag.ontology_io import add_dataset_metadata_from_dataframe
import pandas as pd
# Load dataset
df = pd.read_csv("data.csv")
# Add to ontology
add_dataset_metadata_from_dataframe(
dataset_id="my_dataset",
df=df,
domain="finance",
location="database",
date="2023-01-01",
models=["forest_model_1"],
output_file="ontology.json"
)
Visualizing the Ontology
from ldrag.ontology import Ontology
ontology = Ontology()
ontology.deserialize("ontology.json")
# Create dynamic HTML visualization
ontology.create_dynamic_instance_graph("my_graph")
Querying with Natural Language
from ldrag.ontology import Ontology
from ldrag.retriever import information_retriever
ontology = Ontology()
ontology.deserialize("ontology.json")
# Query the ontology
result = information_retriever(
ontology=ontology,
user_query="How many rows does the dataset from September have?"
)
Project Structure
ontology.py- Core ontology classes and data structuresontology_io.py- Input/output operations for the ontology including ML model integrationretriever.py- Retrieval Augmented Generation functionalitygptconnector.py- OpenAI API connector for GPT model integrationconfig.py- Configuration settings
License
[Add license information]
Contributing
[Add contribution guidelines]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ldrag-0.2.0.tar.gz.
File metadata
- Download URL: ldrag-0.2.0.tar.gz
- Upload date:
- Size: 20.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d3e1a658e48bc4ea8da68cfc5e1148dfd89cbeb36921ecabb81aca875dd747c
|
|
| MD5 |
ffbac16208f1f4a5758e49e36bd8074f
|
|
| BLAKE2b-256 |
aa2f79d622ed1c620762928b114955492406833779ee0bb7e06805b83e7c075f
|
File details
Details for the file ldrag-0.2.0-py3-none-any.whl.
File metadata
- Download URL: ldrag-0.2.0-py3-none-any.whl
- Upload date:
- Size: 20.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75c6313a52c33bd8f0a2c7eee8572857a9af7136fafe957123c517b4dd163c82
|
|
| MD5 |
7839e6fe318c1f6d2e38c78007ced3e4
|
|
| BLAKE2b-256 |
9dc519d48f873d741a4271d4e4f3d5829246fb10d7223188d39bce4671fb7ed5
|