LlamaIndex PropertyGraphStore backed by GrafeoDB embedded graph database
Project description
grafeo-llamaindex
LlamaIndex PropertyGraphStore backed by GrafeoDB, an embedded graph database with native vector search.
Build knowledge graphs from documents, query them with GQL, and run vector similarity search, all in a single .db file. No servers, no infrastructure.
Install
uv add grafeo-llamaindex
Quickstart
from llama_index.core import PropertyGraphIndex, SimpleDirectoryReader
from grafeo_llamaindex import GrafeoPropertyGraphStore
documents = SimpleDirectoryReader("./data").load_data()
graph_store = GrafeoPropertyGraphStore(db_path="./knowledge_graph.db")
index = PropertyGraphIndex.from_documents(
documents,
property_graph_store=graph_store,
embed_kg_nodes=True,
)
retriever = index.as_retriever(include_text=True)
nodes = retriever.retrieve("What are the key relationships?")
Features
- Full PropertyGraphStore: all 8 abstract methods implemented (
get,get_triplets,get_rel_map,upsert_nodes,upsert_relations,delete,structured_query,vector_query) - Structured + vector queries:
supports_structured_queries = Trueandsupports_vector_queries = Truein a single store - Embedded database: no Docker, no cloud, no external services. Just
uv add grafeo - Single-file persistence: the entire knowledge graph lives in one
.dbfile - Native HNSW vector search: embeddings stored alongside graph nodes, no separate vector DB needed
- Multi-language queries: GQL, Cypher, Gremlin, GraphQL, SPARQL and SQL/PGQ all supported
- Built-in graph algorithms: PageRank, Louvain, shortest paths, centrality and 30+ more via
graph_store.client.algorithms
API Reference
GrafeoPropertyGraphStore
from grafeo_llamaindex import GrafeoPropertyGraphStore
store = GrafeoPropertyGraphStore(
db_path=None, # str | None - path for persistent storage, None for in-memory
embedding_dimensions=1536, # int - vector dimensions for HNSW index
embedding_metric="cosine", # str - "cosine", "euclidean", "dot_product", or "manhattan"
dedup_threshold=None, # float | None - cosine similarity threshold for entity dedup
)
Properties:
store.client: access the underlyinggrafeo.GrafeoDBinstance for direct queries and algorithmsstore.supports_structured_queries:Truestore.supports_vector_queries:True
Methods (PropertyGraphStore interface):
| Method | Description |
|---|---|
upsert_nodes(nodes) |
Insert or update EntityNode / ChunkNode objects |
upsert_relations(relations) |
Insert edges between existing nodes |
get(properties, ids) |
Retrieve nodes by ID or property filter |
get_triplets(entity_names, relation_names, ids) |
Get (source, relation, target) triplets |
get_rel_map(graph_nodes, depth, ignore_rels) |
BFS traversal from seed nodes |
delete(entity_names, relation_names, ids) |
Remove nodes and/or edges |
structured_query(query) |
Execute raw GQL/Cypher (or Gremlin with g. prefix) |
vector_query(query) |
HNSW similarity search over node embeddings |
get_schema() / get_schema_str() |
Inspect graph labels, edge types, and properties |
persist(path) |
Save in-memory database to disk |
close() |
Close the database connection |
Persistence
The entire knowledge graph lives in a single .db file. Pass db_path to store data on disk, or omit it for in-memory use.
from grafeo_llamaindex import GrafeoPropertyGraphStore
# Create and populate
store = GrafeoPropertyGraphStore(db_path="./my_graph.db")
# ... upsert nodes and relations ...
store.close()
# Reopen later with the same path
store = GrafeoPropertyGraphStore(db_path="./my_graph.db")
print(store.node_count, store.edge_count) # data is still there
You can also save an in-memory store to disk:
store = GrafeoPropertyGraphStore() # in-memory
# ... populate ...
store.persist("./snapshot.db")
Deduplication
When dedup_threshold is set, upsert_nodes checks whether an incoming EntityNode's embedding is similar enough to an existing node (same label) to merge them instead of creating a duplicate.
store = GrafeoPropertyGraphStore(
dedup_threshold=0.95, # cosine similarity threshold
embedding_dimensions=1536,
)
Key behavior:
- Threshold semantics: if
cosine_similarity(new, existing) >= dedup_threshold, the new node merges into the existing one (properties are overwritten, the originalcreated_attimestamp is preserved). - Label-scoped: dedup only compares nodes with the same label. A "Person" and a "Company" with identical embeddings are never merged.
- ChunkNode excluded:
ChunkNodeobjects are never deduplicated, onlyEntityNode. - Requires embedding: nodes without an embedding are never deduplicated.
- Runtime toggle: you can set
store.dedup_threshold = 0.9at any time and it takes effect on the nextupsert_nodescall.
Relation Upsert Behavior
upsert_relations silently skips relations whose source_id or target_id does not match any existing node (by name or LlamaIndex ID). A UserWarning is emitted for each skipped relation, so you can catch these with Python's warnings module if needed.
Comparison
| Neo4j | FalkorDB | Grafeo | |
|---|---|---|---|
| Requires server | Yes | Yes | No (embedded) |
| Vector search | Plugin (5.x+) | Limited | Native HNSW |
| Graph algorithms | GDS plugin ($) | Built-in | Built-in (30+) |
| Query languages | Cypher | Cypher | GQL, Cypher, Gremlin, GraphQL, SPARQL, SQL/PGQ |
| Deployment | Docker/Cloud | Docker/Cloud | uv add grafeo |
| Persistence | Server-managed | Server-managed | Single .db file |
Examples
See the examples/ directory:
mock_embedding_demo.py: full demo with hand-crafted embeddings, no API key requiredbasic_graph_rag.py: build a Property Graph Index from documents and query it (requires OpenAI API key)hybrid_retrieval.py: structured queries + vector search + PageRank, all in one script
Development
uv sync # install deps
uv run pytest -v # run tests
uv run ruff check . # lint
uv run ruff format . # format
uv run ty check # type check
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file grafeo_llamaindex-0.2.0.tar.gz.
File metadata
- Download URL: grafeo_llamaindex-0.2.0.tar.gz
- Upload date:
- Size: 160.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4539180e2d114d787ec2a1cc1a343a7ef7e18125dee4cb5754a30857a22c4155
|
|
| MD5 |
34164bf9ab6117b161e5a931f86ff2df
|
|
| BLAKE2b-256 |
1873edaa2f1a48679b608dbc9121d4f3de34cec507a33e87759de6ee3e4ec0a2
|
Provenance
The following attestation bundles were made for grafeo_llamaindex-0.2.0.tar.gz:
Publisher:
pypi.yml on GrafeoDB/grafeo-llamaindex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
grafeo_llamaindex-0.2.0.tar.gz -
Subject digest:
4539180e2d114d787ec2a1cc1a343a7ef7e18125dee4cb5754a30857a22c4155 - Sigstore transparency entry: 1281135513
- Sigstore integration time:
-
Permalink:
GrafeoDB/grafeo-llamaindex@722944f4e5988c978deb5d1ef701308e364a5fe2 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/GrafeoDB
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@722944f4e5988c978deb5d1ef701308e364a5fe2 -
Trigger Event:
release
-
Statement type:
File details
Details for the file grafeo_llamaindex-0.2.0-py3-none-any.whl.
File metadata
- Download URL: grafeo_llamaindex-0.2.0-py3-none-any.whl
- Upload date:
- Size: 16.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae7e685f6f9462264fc2b1d4210ce26fe1bae43d3844a3ecf1713289899c9cde
|
|
| MD5 |
bccf3101dd09f11ebab9e676db8448b3
|
|
| BLAKE2b-256 |
b0d629eb59ac1c2248aa2a43e243c43aa501ff78ed8108a6bbee07d2e0857517
|
Provenance
The following attestation bundles were made for grafeo_llamaindex-0.2.0-py3-none-any.whl:
Publisher:
pypi.yml on GrafeoDB/grafeo-llamaindex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
grafeo_llamaindex-0.2.0-py3-none-any.whl -
Subject digest:
ae7e685f6f9462264fc2b1d4210ce26fe1bae43d3844a3ecf1713289899c9cde - Sigstore transparency entry: 1281135538
- Sigstore integration time:
-
Permalink:
GrafeoDB/grafeo-llamaindex@722944f4e5988c978deb5d1ef701308e364a5fe2 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/GrafeoDB
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@722944f4e5988c978deb5d1ef701308e364a5fe2 -
Trigger Event:
release
-
Statement type: