Skip to main content

Create knowledge graphs with LLMs

Project description

llmgraph

pypi Version build

Create knowledge graphs with LLMs.

example machine learning output

llmgraph enables you to create knowledge graphs in GraphML, GEXF, and HTML formats (generated via pyvis) from a given source entity Wikipedia page. The knowledge graphs are generated by extracting world knowledge from ChatGPT or other large language models (LLMs).

Features

  • Create knowledge graphs, given a source entity.
  • Uses ChatGPT (or another specified LLM) to extract world knowledge.
  • Generate knowledge graphs in HTML, GraphML, and GEXF formats.
  • Many entity types and relationships supported by customised prompts.
  • Cache support to iteratively grow a knowledge graph, efficiently.
  • Outputs total tokens used to understand LLM costs (even though a default run is only about 1 cent).
  • Customisable model (default is gpt-3.5-turbo for speed and cost).

Installation

You can install llmgraph using pip:

pip install llmgraph

Example Output

In addition to GraphML and GEXF formats, an HTML pyvis physics enabled graph can be viewed:

Artificial Intelligence example

example machine-learning output Generate above machine-learning graph:
llmgraph machine-learning "https://en.wikipedia.org/wiki/Artificial_intelligence" --levels 4
View entire graph: machine-learning_artificial-intelligence_v1.0.0_level4_fully_connected.html

llmgraph Usage

Example Usage

The example above was generated with the following command, which requires an entity_type and a quoted entity_wikipedia souce url:

llmgraph machine-learning "https://en.wikipedia.org/wiki/Artificial_intelligence" --levels 3

This example creates a 3 level graph, based on the given start node Artificial Intelligence.

Note that you will need to set an environment variable 'OPENAI_API_KEY' prior to running. See the OpenAI docs for more info. The total tokens used is output as the run progresses. For reference this 3 level example used a total of 7,650 gpt-3.5-turbo tokens, which is approx 1.5 cents as of Oct 2023.

The entity_type sets the LLM prompt used to find related entities to include in the graph. The full list can be seen in prompts.yaml and include the following entity types:

  • automobile
  • book
  • computer-game
  • concepts-general
  • creative-general
  • documentary
  • food
  • machine-learning
  • movie
  • music
  • people-historical
  • podcast
  • software-engineering
  • tv

Required Arguments

  • entity_type (TEXT): Entity type (e.g. movie)
  • entity_wikipedia (TEXT): Full Wikipedia link to the root entity

Optional Arguments

  • --entity-root (TEXT): Optional root entity name override if different from the Wikipedia page title [default: None]
  • --levels (INTEGER): Number of levels deep to construct from the central root entity [default: 2]
  • --max-sum-total-tokens (INTEGER): Maximum sum of tokens for graph generation [default: 200000]
  • --output-folder (TEXT): Folder location to write outputs [default: ./_output/]
  • --llm-model (TEXT): The model name [default: gpt-3.5-turbo]
  • --llm-temp (FLOAT): LLM temperature value [default: 0.0]
  • --llm-use-localhost (INTEGER): LLM use localhost:8081 instead of OpenAI [default: 0]
  • --help: Show this message and exit.

More Examples of HTML Output

Here are some more examples of the HTML graph output for different entity types and root entities (with commands to generate and links to view full interactive graphs).

Install llmgraph to create your own knowledge graphs! Feel free to share interesting results in the issue section above with a documentation label :)

Knowledge graph concept example

example concepts-general output Command to generate above concepts-general graph:
llmgraph concepts-general "https://en.wikipedia.org/wiki/Knowledge_graph" --levels 4
View entire graph: concepts-general_knowledge-graph_v1.0.0_level4_fully_connected.html

Inception movie example

example movie output Command to generate above movie graph:
llmgraph movie "https://en.wikipedia.org/wiki/Inception" --levels 4
View entire graph: movie_inception_v1.0.0_level4_fully_connected.html

OpenAI company example

example company output Command to generate above company graph:
llmgraph company "https://en.wikipedia.org/wiki/OpenAI" --levels 4
View entire graph: company_openai_v1.0.0_level4_fully_connected.html

John von Neumann people example

example people-historical output Command to generate above people-historical graph:
llmgraph people-historical "https://en.wikipedia.org/wiki/John_von_Neumann" --levels 4
View entire graph: people-historical_john-von-neumann_v1.0.0_level4_fully_connected.html

Example of Prompt Used to Generate Graph

Here is an example of the prompt template, with place holders, used to generate related entities from a given source entity. This is applied recursively to create a knowledge graph, merging duplicated nodes as required.

You are knowledgeable about {knowledgeable_about}.
List, in json array format, the top {top_n} {entities} most like '{{entity_root}}'
with Wikipedia link, reasons for similarity, similarity on scale of 0 to 1.
Format your response in json array format as an array with column names: 'name', 'wikipedia_link', 'reason_for_similarity', and 'similarity'.
Example response: {{{{"name": "Example {entity}","wikipedia_link": "https://en.wikipedia.org/wiki/Example_{entity_underscored}","reason_for_similarity": "Reason for similarity","similarity": 0.5}}}}

It works well on the primary tested LLM, being OpenAI gpt-3.5-turbo. Results are ok, but not as good using Llama2. The prompt source of truth and additional details can be see in prompts.yaml.

Each entity type has custom placeholders, for example concepts-general and documentary:

concepts-general:
    system: You are a highly knowledgeable ontologist and creator of knowledge graphs.
    knowledgeable_about: many concepts and ontologies.
    entities: concepts
    entity: concept name
    top_n: 5

documentary:
    system: You are knowledgeable about documentaries of all types, and genres.
    knowledgeable_about: documentaries of all types, and genres
    entities: Documentaries
    entity: Documentary
    top_n: 5

Cached LLM API calls

Each call to the LLM API (and Wikipedia) is cached locally in a .joblib_cache folder. This allows an interrupted run to be resumed without duplicating identical calls. It also allows a re-run with a higher --level option to re-use results from the lower level run (assuming the same entity type and source).

Future Improvements

  • Improve support for locally running LLM server (e.g. via ollama)
  • Contrast graph output from different LLM models (e.g. Llama2 vs Mistral vs ChatGPT-4)
  • Investigate the hypothosis that this approach provides insight into how an LLM views the world.
  • Include more examples in this documentation and make examples available for easy browsing.
  • Instructions for running locally and adding a custom entity_type prompt.
  • Better pyviz html output, in particular including reasons for entity relationship in UI and arguments for pixel size etc.
  • Parallelise API calls and result processing.
  • Remove dependency on Wikipedia entities as a source.
  • Contrast results from llmgraphg with other non-LLM graph construction e.g. using wikipedia page links, or direct article embeddings.

Contributing

Contributions to llmgraph are welcome. Please follow these steps:

  1. Fork the repository.
  2. Create a new branch for your feature or bug fix.
  3. Make your changes and commit them.
  4. Create a pull request with a description of your changes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmgraph-1.0.1.tar.gz (17.9 kB view details)

Uploaded Source

Built Distribution

llmgraph-1.0.1-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file llmgraph-1.0.1.tar.gz.

File metadata

  • Download URL: llmgraph-1.0.1.tar.gz
  • Upload date:
  • Size: 17.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.6 Darwin/22.5.0

File hashes

Hashes for llmgraph-1.0.1.tar.gz
Algorithm Hash digest
SHA256 6018d7317ef6ede766fc5677236ba619c4f2da14a1c1a3f6c0f274b42706d0a3
MD5 14f52a83364ce076661b589e2a2dc507
BLAKE2b-256 fb22ea6012fb35ec96cef8641070b659211a7e93c322ac418bb41f21c2ccbf55

See more details on using hashes here.

File details

Details for the file llmgraph-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: llmgraph-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.6 Darwin/22.5.0

File hashes

Hashes for llmgraph-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4aae0b61db5aef3c64b5096524e5e86c9e8ea0fe1a292d6f4afc29c19123a2b1
MD5 0e59fcdf234de5d9c7be49f74dbfd9a2
BLAKE2b-256 25cbd334a4d7ced592e61b8455bbe7faa2c52f38866436af87771570cca96c7c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page