Skip to main content

LlamaIndex PropertyGraphStore backed by GrafeoDB embedded graph database

Project description

CI codecov PyPI License

grafeo-llamaindex

LlamaIndex PropertyGraphStore backed by GrafeoDB, an embedded graph database with native vector search.

Build knowledge graphs from documents, query them with GQL, and run vector similarity search, all in a single .db file. No servers, no infrastructure.

Install

uv add grafeo-llamaindex

Quickstart

from llama_index.core import PropertyGraphIndex, SimpleDirectoryReader
from grafeo_llamaindex import GrafeoPropertyGraphStore

documents = SimpleDirectoryReader("./data").load_data()

graph_store = GrafeoPropertyGraphStore(db_path="./knowledge_graph.db")

index = PropertyGraphIndex.from_documents(
    documents,
    property_graph_store=graph_store,
    embed_kg_nodes=True,
)

retriever = index.as_retriever(include_text=True)
nodes = retriever.retrieve("What are the key relationships?")

Features

  • Full PropertyGraphStore: all 8 abstract methods implemented (get, get_triplets, get_rel_map, upsert_nodes, upsert_relations, delete, structured_query, vector_query)
  • Structured + vector queries: supports_structured_queries = True and supports_vector_queries = True in a single store
  • Embedded database: no Docker, no cloud, no external services. Just uv add grafeo
  • Single-file persistence: the entire knowledge graph lives in one .db file
  • Native HNSW vector search: embeddings stored alongside graph nodes, no separate vector DB needed
  • Multi-language queries: GQL, Cypher, Gremlin, GraphQL, SPARQL and SQL/PGQ all supported
  • Built-in graph algorithms: PageRank, Louvain, shortest paths, centrality and 30+ more via graph_store.client.algorithms

API Reference

GrafeoPropertyGraphStore

from grafeo_llamaindex import GrafeoPropertyGraphStore

store = GrafeoPropertyGraphStore(
    db_path=None,                # str | None - path for persistent storage, None for in-memory
    embedding_dimensions=1536,   # int - vector dimensions for HNSW index
    embedding_metric="cosine",   # str - "cosine", "euclidean", "dot_product", or "manhattan"
    dedup_threshold=None,        # float | None - cosine similarity threshold for entity dedup
)

Properties:

  • store.client: access the underlying grafeo.GrafeoDB instance for direct queries and algorithms
  • store.supports_structured_queries: True
  • store.supports_vector_queries: True

Methods (PropertyGraphStore interface):

Method Description
upsert_nodes(nodes) Insert or update EntityNode / ChunkNode objects
upsert_relations(relations) Insert edges between existing nodes
get(properties, ids) Retrieve nodes by ID or property filter
get_triplets(entity_names, relation_names, ids) Get (source, relation, target) triplets
get_rel_map(graph_nodes, depth, ignore_rels) BFS traversal from seed nodes
delete(entity_names, relation_names, ids) Remove nodes and/or edges
structured_query(query) Execute raw GQL/Cypher (or Gremlin with g. prefix)
vector_query(query) HNSW similarity search over node embeddings
get_schema() / get_schema_str() Inspect graph labels, edge types, and properties
persist(path) Save in-memory database to disk
close() Close the database connection

Persistence

The entire knowledge graph lives in a single .db file. Pass db_path to store data on disk, or omit it for in-memory use.

from grafeo_llamaindex import GrafeoPropertyGraphStore

# Create and populate
store = GrafeoPropertyGraphStore(db_path="./my_graph.db")
# ... upsert nodes and relations ...
store.close()

# Reopen later with the same path
store = GrafeoPropertyGraphStore(db_path="./my_graph.db")
print(store.node_count, store.edge_count)  # data is still there

You can also save an in-memory store to disk:

store = GrafeoPropertyGraphStore()  # in-memory
# ... populate ...
store.persist("./snapshot.db")

Deduplication

When dedup_threshold is set, upsert_nodes checks whether an incoming EntityNode's embedding is similar enough to an existing node (same label) to merge them instead of creating a duplicate.

store = GrafeoPropertyGraphStore(
    dedup_threshold=0.95,  # cosine similarity threshold
    embedding_dimensions=1536,
)

Key behavior:

  • Threshold semantics: if cosine_similarity(new, existing) >= dedup_threshold, the new node merges into the existing one (properties are overwritten, the original created_at timestamp is preserved).
  • Label-scoped: dedup only compares nodes with the same label. A "Person" and a "Company" with identical embeddings are never merged.
  • ChunkNode excluded: ChunkNode objects are never deduplicated, only EntityNode.
  • Requires embedding: nodes without an embedding are never deduplicated.
  • Runtime toggle: you can set store.dedup_threshold = 0.9 at any time and it takes effect on the next upsert_nodes call.

Relation Upsert Behavior

upsert_relations silently skips relations whose source_id or target_id does not match any existing node (by name or LlamaIndex ID). A UserWarning is emitted for each skipped relation, so you can catch these with Python's warnings module if needed.

Comparison

Neo4j FalkorDB Grafeo
Requires server Yes Yes No (embedded)
Vector search Plugin (5.x+) Limited Native HNSW
Graph algorithms GDS plugin ($) Built-in Built-in (30+)
Query languages Cypher Cypher GQL, Cypher, Gremlin, GraphQL, SPARQL, SQL/PGQ
Deployment Docker/Cloud Docker/Cloud uv add grafeo
Persistence Server-managed Server-managed Single .db file

Examples

See the examples/ directory:

Development

uv sync                  # install deps
uv run pytest -v         # run tests
uv run ruff check .      # lint
uv run ruff format .     # format
uv run ty check          # type check

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grafeo_llamaindex-0.2.0.tar.gz (160.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

grafeo_llamaindex-0.2.0-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file grafeo_llamaindex-0.2.0.tar.gz.

File metadata

  • Download URL: grafeo_llamaindex-0.2.0.tar.gz
  • Upload date:
  • Size: 160.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for grafeo_llamaindex-0.2.0.tar.gz
Algorithm Hash digest
SHA256 4539180e2d114d787ec2a1cc1a343a7ef7e18125dee4cb5754a30857a22c4155
MD5 34164bf9ab6117b161e5a931f86ff2df
BLAKE2b-256 1873edaa2f1a48679b608dbc9121d4f3de34cec507a33e87759de6ee3e4ec0a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for grafeo_llamaindex-0.2.0.tar.gz:

Publisher: pypi.yml on GrafeoDB/grafeo-llamaindex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file grafeo_llamaindex-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for grafeo_llamaindex-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ae7e685f6f9462264fc2b1d4210ce26fe1bae43d3844a3ecf1713289899c9cde
MD5 bccf3101dd09f11ebab9e676db8448b3
BLAKE2b-256 b0d629eb59ac1c2248aa2a43e243c43aa501ff78ed8108a6bbee07d2e0857517

See more details on using hashes here.

Provenance

The following attestation bundles were made for grafeo_llamaindex-0.2.0-py3-none-any.whl:

Publisher: pypi.yml on GrafeoDB/grafeo-llamaindex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page