Skip to main content

A powerful tool for transforming documents into graph-based structures using Large Language Models (LLMs).

Project description

🧠 LLMGraphTransformer

LLMGraphTransformer is a Python library designed to extract structured knowledge graphs from unstructured text using LLMs. It allows users to define schemas for nodes and relationships, ensuring that the extracted graph follows a strict format. 🔗📊

🚀 Installation

Install LLMGraphTransformer from PyPI:

pip install LLMGraphTransformer

🛠️ Usage

📥 Importing the Required Modules

from LLMGraphTransformer import LLMGraphTransformer
from LLMGraphTransformer.schema import NodeSchema, RelationshipSchema
from langchain_openai import ChatOpenAI
from langchain_core.documents import Document


from dotenv import load_dotenv
import os
load_dotenv(".env")  

🏗️ Defining the Schema

🏷️ Node Schemas

Node schemas define the types of entities that can be extracted from the text. Each node has:

  • A type (e.g., "Person", "Organization")
  • A list of properties that store additional information (e.g., "name", "birth_year")
  • An optional description to describe the node type

📌 Example:

node_schemas = [
    NodeSchema("Person", ["name", "birth_year", "death_year", "nationalitie", "profession"], "Represents an individual"),
    NodeSchema("Organization", ["name", "founding_year", "industrie"], "Represents a group, company, or institution"),
    NodeSchema("Location", ["name"], "Represents a geographical area such as a city, country, or region"),
    NodeSchema("Award", ["name", "field"], "Represents an honor, prize, or recognition")
]

🔗 Relationship Schemas

Relationship schemas define the allowed connections between entities. Each relationship has:

  • A source node type
  • A target node type
  • A relationship type
  • A list of optional properties (e.g., "year")

📌 Example:

relationship_schemas = [
    RelationshipSchema("Person", "SPOUSE_OF", "Person"),
    RelationshipSchema("Person", "MEMBER_OF", "Organization", ["start_year", "end_year", "year"]),
    RelationshipSchema("Person", "AWARDED", "Award", ["year"]),
    RelationshipSchema("Person", "LOCATED_IN", "Location"),
    RelationshipSchema("Organization", "LOCATED_IN", "Location")
]

⚙️ Defining Additional Instructions

You can specify additional rules for extraction:

additional_instructions="""- all names must be extracted as uppercase"""

📜 Defining the Input Text

Provide the text from which the knowledge graph should be extracted:

text="""Marie Curie, born in 1867, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.
She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.
Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.
She was, in 1906, the first woman to become a professor at the University of Paris."""

🤖 Initializing the LLM Model

Use OpenAI's API (or a compatible model) to process the text:

api_key = os.getenv("API_KEY")
base_url = os.getenv("BASE_URL")
model_name = os.getenv("MODEL_NAME")

llm = ChatOpenAI(
    api_key=api_key,
    base_url=base_url,
    model=model_name,
    temperature=0,
)

🔄 Initializing the Transformer

Create an instance of LLMGraphTransformer:

llm_transformer = LLMGraphTransformer(
    llm=llm,
    allowed_nodes=node_schemas,
    allowed_relationships=relationship_schemas,
    additional_instructions=additional_instructions
)

🔍 Converting Text to a Knowledge Graph

Process the text into a structured knowledge graph:

document = Document(page_content=text)
graph_document = llm_transformer.convert_to_graph_document(document)

print(f"Nodes: {graph_document.nodes}")
print(f"Relationships: {graph_document.relationships}")

📊 Output Format

The extracted knowledge graph will be represented in JSON format with nodes and relationships:

{
  "nodes": [
    {
      "id": "Marie Curie",
      "type": "Person",
      "properties": {
        "name": "Marie Curie",
        "birth_year": "1867",
        "nationalitie": ["Polish", "French"],
        "profession": ["physicist", "chemist"]
      }
    },
    ...
  ],
  "relationships": [
    {
      "source": "Marie Curie",
      "target": "Pierre Curie",
      "type": "SPOUSE_OF"
    },
    ...
  ]
}

📜 License

This project is licensed under the MIT License.

🤝 Contributing

Pull requests and feature suggestions are welcome! Open an issue for bug reports or improvements. 🚀

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmgraphtransformer-0.0.2.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmgraphtransformer-0.0.2-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file llmgraphtransformer-0.0.2.tar.gz.

File metadata

  • Download URL: llmgraphtransformer-0.0.2.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for llmgraphtransformer-0.0.2.tar.gz
Algorithm Hash digest
SHA256 c4bbbccd830b9deb3159dc8bbc744fe7018c4a129572b25e7c51dfa9a0353578
MD5 731ba085e145aaee89070fd238699759
BLAKE2b-256 98034e18b9193222c2a619df738a0bc951b9aeded6b102b5cad0c809e845840c

See more details on using hashes here.

File details

Details for the file llmgraphtransformer-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for llmgraphtransformer-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9542cca840dd394b725c66356d704409d4511f8db6d0a830f7b831dec8d58fe8
MD5 edc65049298275ee7cbd0ae9e6b1c06a
BLAKE2b-256 14942847ef972aa31132e6626bdd28da434e64016b1db9fdb9addec9816e91fb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page