Skip to main content

Extract a knowledge graph using LLMs from any text or messages array

Project description

kg-gen: Knowledge Graph Generation from Any Text

Welcome! kg-gen helps you generate knowledge graphs from any source text using AI. It can process both small and large text inputs, and it can also handle messages in a conversation format.

Why generate knowledge graphs? kg-gen is great if you want to:

  • Create a graph to assist with RAG (Retrieval-Augmented Generation)
  • Create graph synthetic data for model training and testing
  • Structure any text into a graph
  • Analyze the relationships between concepts in your source text

We support all model providers supported by LiteLLM. We also use DSPy for structured output generation.

Quick start

Install the module:

pip install kg-gen

Then import and use kg-gen. You can provide your text input in one of two formats:

  1. A single string
  2. A list of Message objects (each with a role and content)

Below are some example snippets:

from kg_gen import KGGen

# Initialize the KGGen
kg = KGGen()

# EXAMPLE 1: Single string with model
text_input = "Linda is Josh's mother. Ben is Josh's brother. Andrew is Josh's father. Judy is Andrew's sister. Josh is Judy's nephew. Judy is Josh's aunt."
graph_1 = kg.generate(
  input_data=text_input,
  model="openai/gpt-4o"
  api_key="<OPENAI_API_KEY>" # Optional if this is set in your environment
)
# Output: 
# entities={'Linda', 'Judy', 'Ben', 'Andrew', 'Josh'} 
# edges={'is sister of', 'is father of', 'is aunt of', 'is brother of', 
# 'is mother of', 'is nephew of'} 
# relations={('Judy', 'is aunt of', 'Josh'), ('Josh', 'is nephew of', 'Judy'), 
# ('Andrew', 'is father of', 'Josh'), ('Ben', 'is brother of', 'Josh'), 
# ('Judy', 'is sister of', 'Andrew'), ('Linda', 'is mother of', 'Josh')}

# EXAMPLE 2: Messages array with role filtering
messages = [
  {"role": "user", "content": "What is the capital of France?"}, 
  {"role": "assistant", "content": "The capital of France is Paris."}
]
graph_3 = kg.generate(
  input_data=messages,
  model="openai/gpt-4o-mini"
)
# Output: 
# entities={'Paris', 'France'} 
# edges={'has capital'} 
# relations={('France', 'has capital', 'Paris')}

Message Array Processing

When processing message arrays, kg-gen:

  1. Preserves the role information from each message
  2. Maintains message order and boundaries
  3. Can extract entities and relationships:
    • Between concepts mentioned in messages
    • Between speakers (roles) and concepts
    • Across multiple messages in a conversation

For example, given this conversation:

messages = [
  {"role": "user", "content": "What is the capital of France?"},
  {"role": "assistant", "content": "The capital of France is Paris."}
]

The generated graph might include entities like:

  • "user"
  • "assistant"
  • "France"
  • "Paris"

And relations like:

  • (user, "asks about", "France")
  • (assistant, "states", "Paris")
  • (Paris, "is capital of", "France")

License

The MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kg_gen-0.1.0.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kg_gen-0.1.0-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file kg_gen-0.1.0.tar.gz.

File metadata

  • Download URL: kg_gen-0.1.0.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for kg_gen-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fe88ddb5ef9ac4da0849deaf2f386cb0510e9486b7455172cb43b4ea435a588b
MD5 ed5ec2497026a9213959ec0aa3595219
BLAKE2b-256 2491ed00514819375bfbe4f478f8871b6068323ff79ad2e0957c20e298809545

See more details on using hashes here.

File details

Details for the file kg_gen-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: kg_gen-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for kg_gen-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 13d37cbe8e0b326f0df23fd1eb642f488d97263c0051c85db4434a7aac4f276b
MD5 a2903abfa929910585ad61aa6413a701
BLAKE2b-256 299557076df8fec1f80cc750b3a3f604bad6af9bdadfba849f9cb668fd4f49b7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page