Skip to main content

Implementation of GraphRAG (https://arxiv.org/pdf/2404.16130)

Project description

GraphRAG - Powered by LangChain

Documentation build status pre-commit

This is an implementation of GraphRAG as described in

https://arxiv.org/pdf/2404.16130

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Official implementation by the authors of the paper is available at:

https://github.com/microsoft/graphrag/

Guides

Why re-implementation ๐Ÿค”?

Personal Preference

While I generally prefer utilizing and refining existing implementations, as re-implementation often isn't optimal, I decided to take a different approach after encountering several challenges with the official version.

Issues with the Official Implementation

  • Lacks integration with popular frameworks like LangChain, LlamaIndex, etc.
  • Limited to OpenAI and AzureOpenAI models, with no support for other providers.

Why reling on established frameworks like LangChain?

Using an established foundation like LangChain offers numerous benefits. It abstracts various providers, whether related to LLMs, embeddings, vector stores, etc., allowing for easy component swapping without altering core logic or adding complex support. More importantly, a solid foundation like this lets you focus on the problem's core logic rather than reinventing the wheel.

LangChain also supports advanced features like batching and streaming, provided your components align with the frameworkโ€™s guidelines. For instance, using chains (LCEL) allows you to take full advantage of these capabilities.

Modularity & Extensibility focused design

The APIs are designed to be modular and extensible. You can replace any component with your own implementation as long as it implements the required interface.

Given the nature of the domain, this is important for conducting experiments by swapping out various components.

Install

pip install langchain-graphrag

Projects

There are 2 projects in the repo:

langchain_graphrag

This is the core library that implements the GraphRAG paper. It is built on top of the langchain library.

An example code for local search using the API

Below is a snippet taken from the example-app to show the style of API and extensibility offered by the library.

Almost all the components (classes/functions) can be replaced by your own implementations. The library is designed to be modular and extensible.

# Reload the vector Store that stores
# the entity name & description embeddings
entities_vector_store = ChromaVectorStore(
    collection_name="entity_name_description",
    persist_directory=str(vector_store_dir),
    embedding_function=make_embedding_instance(
        embedding_type=embedding_type,
        model=embedding_model,
        cache_dir=cache_dir,
    ),
)

# Build the Context Selector using the default
# components; You can supply the various components
# and achieve as much extensibility as you want
# Below builds the one using default components.
context_selector = ContextSelector.build_default(
    entities_vector_store=entities_vector_store,
    entities_top_k=10,
    community_level=cast(CommunityLevel, level),
)

# Context Builder is responsible for taking the
# result of Context Selector & building the
# actual context to be inserted into the prompt
# Keeping these two separate further increases
# extensibility & maintainability
context_builder = ContextBuilder.build_default(
    token_counter=TiktokenCounter(),
)

# load the artifacts
artifacts = load_artifacts(artifacts_dir)

# Make a langchain retriever that relies on
# context selection & builder
retriever = LocalSearchRetriever(
    context_selector=context_selector,
    context_builder=context_builder,
    artifacts=artifacts,
)

# Build the LocalSearch object
local_search = LocalSearch(
    prompt_builder=LocalSearchPromptBuilder(),
    llm=make_llm_instance(llm_type, llm_model, cache_dir),
    retriever=retriever,
)

# it's a callable that returns the chain
search_chain = local_search()

# you could invoke
# print(search_chain.invoke(query))

# or, you could stream
for chunk in search_chain.stream(query):
    print(chunk, end="", flush=True)

Clone the repo

git clone https://github.com/ksachdeva/langchain-graphrag.git

Open in VSCode devcontainer (Recommended)

Devcontainer will install all the dependencies

If not using devcontainer

Make sure you have rye installed. See https://rye.astral.sh/

# sync all the dependencies
rye sync

examples/simple-app

This is a simple typer based CLI app.

In terms of configuration it is limited by the number of command line options exposed.

That said, the way core library is written you can easily replace any component by your own implementation i.e. your choice of LLM, embedding models etc. Even some of the classes as long as they implement the required interface.

Note:

Make sure to rename .env.example with .env if you are using OpenAI or AzureOpenAI and fill in the necessary environment variables.

Indexing

rye run simple-app-indexer --llm-type azure_openai --llm-model gpt-4o --embedding-type azure_openai --embedding-model text-embedding-3-small
# To see more options
$ rye run simple-app-indexer --help                  
Usage: main.py indexer index [OPTIONS]                                                                                            
                                                                                                                                   
โ•ญโ”€ Options โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ *  --input-file                                     FILE                          [default: None] [required]                    โ”‚
โ”‚ *  --output-dir                                     DIRECTORY                     [default: None] [required]                    โ”‚
โ”‚ *  --cache-dir                                      DIRECTORY                     [default: None] [required]                    โ”‚
โ”‚ *  --llm-type                                       [openai|azure_openai|ollama]  [default: None] [required]                    โ”‚
โ”‚ *  --llm-model                                      TEXT                          [default: None] [required]                    โ”‚
โ”‚ *  --embedding-type                                 [openai|azure_openai|ollama]  [default: None] [required]                    โ”‚
โ”‚ *  --embedding-model                                TEXT                          [default: None] [required]                    โ”‚
โ”‚    --chunk-size                                     INTEGER                       Chunk size for text splitting [default: 1200] โ”‚
โ”‚    --chunk-overlap                                  INTEGER                       Chunk overlap for text splitting              โ”‚
โ”‚                                                                                   [default: 100]                                โ”‚
โ”‚    --ollama-num-context                             INTEGER                       Context window size for ollama model          โ”‚
โ”‚                                                                                   [default: None]                               โ”‚
โ”‚    --enable-langsmith      --no-enable-langsmith                                  Enable Langsmith                              โ”‚
โ”‚                                                                                   [default: no-enable-langsmith]                โ”‚
โ”‚    --help                                                                         Show this message and exit.                   โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

Global Search

rye run simple-app-global-search --llm-type azure_openai --llm-model gpt-4o --query "What are the top themes in this story?"
$ rye run simple-app-global-search --help
Usage: main.py query global-search [OPTIONS]
                                                                                                                                            
โ•ญโ”€ Options โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ *  --output-dir                                     DIRECTORY                     [default: None] [required]                              โ”‚
โ”‚ *  --cache-dir                                      DIRECTORY                     [default: None] [required]                              โ”‚
โ”‚ *  --llm-type                                       [openai|azure_openai|ollama]  [default: None] [required]                              โ”‚
โ”‚ *  --llm-model                                      TEXT                          [default: None] [required]                              โ”‚
โ”‚ *  --query                                          TEXT                          [default: None] [required]                              โ”‚
โ”‚    --level                                          INTEGER                       Community level to search [default: 2]                  โ”‚
โ”‚    --ollama-num-context                             INTEGER                       Context window size for ollama model [default: None]    โ”‚
โ”‚    --enable-langsmith      --no-enable-langsmith                                  Enable Langsmith [default: no-enable-langsmith]         โ”‚
โ”‚    --help                                                                         Show this message and exit.                             โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

Local Search

rye run simple-app-local-search --llm-type azure_openai --llm-model gpt-4o --query "Who is Scrooge, and what are his main relationships?" --embedding-type azure_openai --embedding-model text-embedding-3-small
$ rye run simple-app-local-search --help
Usage: main.py query local-search [OPTIONS]                                                                                                 
                                                                                                                                             
โ•ญโ”€ Options โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ *  --output-dir                                     DIRECTORY                     [default: None] [required]                              โ”‚
โ”‚ *  --cache-dir                                      DIRECTORY                     [default: None] [required]                              โ”‚
โ”‚ *  --llm-type                                       [openai|azure_openai|ollama]  [default: None] [required]                              โ”‚
โ”‚ *  --llm-model                                      TEXT                          [default: None] [required]                              โ”‚
โ”‚ *  --query                                          TEXT                          [default: None] [required]                              โ”‚
โ”‚    --level                                          INTEGER                       Community level to search [default: 2]                  โ”‚
โ”‚ *  --embedding-type                                 [openai|azure_openai|ollama]  [default: None] [required]                              โ”‚
โ”‚ *  --embedding-model                                TEXT                          [default: None] [required]                              โ”‚
โ”‚    --ollama-num-context                             INTEGER                       Context window size for ollama model [default: None]    โ”‚
โ”‚    --enable-langsmith      --no-enable-langsmith                                  Enable Langsmith [default: no-enable-langsmith]         โ”‚
โ”‚    --help                                                                         Show this message and exit.                             โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

See examples/simple-app/README.md for more details.

Roadmap / Things to do

The state of the library is far from complete.

Here are some of the things that need to be done to make it more useful:

  • Add more guides
  • Document the APIs
  • Add more tests

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_graphrag-0.0.9.tar.gz (220.0 kB view details)

Uploaded Source

Built Distribution

langchain_graphrag-0.0.9-py3-none-any.whl (70.9 kB view details)

Uploaded Python 3

File details

Details for the file langchain_graphrag-0.0.9.tar.gz.

File metadata

  • Download URL: langchain_graphrag-0.0.9.tar.gz
  • Upload date:
  • Size: 220.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for langchain_graphrag-0.0.9.tar.gz
Algorithm Hash digest
SHA256 df291c3fd51649c0d4ffa3a885331e3eb2e463bc7cbf13328765e0b7da4c1612
MD5 b8e829bb8a37dce94770e6b6a844a7fe
BLAKE2b-256 fe8fcdda8304317b61aac9390e74d1c483ecf757292859372af5ea7628818a7e

See more details on using hashes here.

File details

Details for the file langchain_graphrag-0.0.9-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_graphrag-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 985d0dcde68555db5b38c26a57f630a3823d18e6c15b9b4aee76f2786407c62e
MD5 d086180a8900e4ad5751f894514b4413
BLAKE2b-256 003d679b290aadc1b3a8270e9ac266ae3a36fa6ef28a69d70e6bb955fd0f69c1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page