Skip to main content

Unified workflow for working with graph RAG systems

Project description

GraphContainer

GraphContainer workflow

GraphContainer provides a unified workflow for working with graph RAG systems. It is designed to load graphs produced by different methods, convert them into a shared internal representation, run retrieval pipelines on top of that representation, visualize retrieval traces in a browser, and execute experiments through a consistent interface.

YouTube Demo

Overview

The main idea behind GraphContainer is simple: different graph RAG methods store graph data in different formats, but once those graphs are converted into a common structure, they can be searched, visualized, and compared in a much more consistent way. In this repository, that common structure is implemented through the Unified Graph State, which stores nodes, edges, adjacency information, and vector indexes in a form that downstream components can access without caring about the original source format.

At the core of the implementation are SimpleGraphContainer and SearchableGraphContainer. SimpleGraphContainer is responsible for holding the in-memory graph itself, while SearchableGraphContainer extends that base structure with pluggable vector indexes such as node_vector. On top of this container layer, the repository provides adapters for different upstream graph formats, including import_graph_from_component_graph (Component Graph), import_graph_from_attribute_bundle_graph (Attribute Bundle Graph), import_graph_from_topology_semantic_graph (Topology-Semantic Graph), and import_graph_from_subgraph_union_graph (Subgraph Union Graph). These adapters are the entry points that translate method-specific graph storage into the unified internal graph state used by the rest of the system.

Once a graph has been loaded, retrieval is handled by the RAG modules under src/graphcontainer/rag. The embedding path is managed through src/graphcontainer/rag/embeddings.py, and the retrieval logic lives in src/graphcontainer/rag/retrievers.py. The repository currently includes two retrieval strategies: OneHopRetriever, which starts from vector-retrieved seed nodes and expands to their immediate neighbors, and FastInsightRetriever, which applies a multi-stage retrieval process with seed selection, deeper exploration, and final filtering. In the current experiment setup, the initial retrieval size is set to 10, and FastInsight keeps the final 5 nodes before answer generation.

The end-to-end experiment pipeline is implemented in test/rag_experiment.py. This script loads the available graphs, applies the retrievers, builds prompts from the retrieved content, sends the prompts to the generator model, and writes the outputs as JSONL files. In other words, the implementation path is: load a graph from a method-specific source, convert it into the unified graph container, run retrieval on top of the shared representation, assemble the retrieved evidence into a prompt, generate an answer, and finally save the result for evaluation.

Installation

Before running the project, make sure uv itself is installed. On macOS and Linux, you can install it with the official standalone installer:

curl -LsSf https://astral.sh/uv/install.sh | sh

On Windows PowerShell, you can install it with:

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

If you prefer another installation method, such as Homebrew, WinGet, Scoop, or pipx, you can use the official uv installation guide.

If you want to install the published package from PyPI, use:

pip install graphcontainer

If you are developing locally from this repository, install the project dependencies with:

uv sync

After installation, restart bash and use the command below to activate the virtual environment.

source .venv/bin/activate

Also, you can get example graphs in the below Google Drive link. Place this on the project root directory.

Google Drive Graph Data

Web-based Visualizer

The web interface is powered by the live visualizer. You can launch it directly from the command line by pointing it to a graph source:

python serve.py --graph component_graph:./data/rag_storage/fastinsight/scifact-openai \
  --host 127.0.0.1 \
  --port 8765 \
  --hops 2

After the server starts, open http://127.0.0.1:8765 in your browser. The page renders the graph or subgraph associated with the current retrieval session and lets you inspect how the retriever moved through the graph. Nodes and edges selected during retrieval can be highlighted, and the visualizer keeps track of session progress so that a query can be inspected step by step instead of only as a final result.

If you already have a graph object in memory, you can launch the same interface from Python by using serve_graph:

from graphcontainer import serve_graph

visualizer = serve_graph(
    graph,
    host="127.0.0.1",
    port=8765,
    default_hops=2,
)

print(visualizer.url)

If your graph is stored in Component Graph format, you can also serve it directly from storage:

from graphcontainer import serve_component_graph

visualizer = serve_component_graph(
    "data/rag_storage/fastinsight/scifact-openai",
    host="127.0.0.1",
    port=8765,
    default_hops=2,
)

In practice, the web page is useful for understanding what happened during retrieval rather than only checking the final answer. A typical flow is to start the visualizer, open the browser page, submit a query or connect to an existing retrieval session, and then inspect the highlighted nodes, edges, and progress updates. This makes it easier to see which evidence was selected, how graph traversal expanded from the initial seeds, and how the retrieved subgraph contributed to the final answer.

Run Experiments

The default experiment path in this repository is provided through scripts/run_batch_experiment.sh. This script is intentionally fixed to the current experimental setup and can be run with:

uv run bash scripts/run_batch_experiment.sh

By default, this runs the experiment on the bsard dataset with query_limit=-1, top_k=10, index_name=node_vector, ollama_url=http://localhost:11434/v1, ollama_model=gemma3:12b, and max_context_chunks=10. The current setup uses text-embedding-3-small for embeddings, and the experiment script iterates over the available graph imports while applying both retrieval methods to each graph.

If you want to run the experiment entry point directly rather than going through the batch script, you can execute:

uv run python test/rag_experiment.py \
  --dataset bsard \
  --query_limit -1 \
  --top_k 10 \
  --index_name node_vector \
  --output_dir ./output/bsard \
  --ollama_url http://localhost:11434/v1 \
  --ollama_model gemma3:12b \
  --max_context_chunks 10

The outputs are saved as JSONL files under ./output/bsard/, typically in files named like <graph_name>_<retriever>.jsonl. Each line contains a single query-output pair in the form {"query": "question text", "output": "generated answer"}. This makes the results easy to evaluate later with a separate judging or comparison pipeline.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphcontainer-0.1.0.tar.gz (243.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

graphcontainer-0.1.0-py3-none-any.whl (261.9 kB view details)

Uploaded Python 3

File details

Details for the file graphcontainer-0.1.0.tar.gz.

File metadata

  • Download URL: graphcontainer-0.1.0.tar.gz
  • Upload date:
  • Size: 243.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for graphcontainer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 40b97cb52bf77e3b0f531c7cc3e7c0105adf4a8aaf3a193272a5e8acbbb8a852
MD5 4ddde3f58782ffad4cb0d9b5e735a6a4
BLAKE2b-256 6a6b87f0cc264c5b4bf890c4af5ca772f3348be53dc50f9c43db7ec2ec66aa50

See more details on using hashes here.

File details

Details for the file graphcontainer-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: graphcontainer-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 261.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for graphcontainer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 244537148d2805b9afc3cd92fca799a32f32ca1484010c405579c16e83d9aa9a
MD5 01e7b1361c698b7db8116f144aba5afb
BLAKE2b-256 6e77ff2f5789c9ff467df7e9a3a0a835cf2209e16b07721d458cbc974286146c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page