LangChain graph store and vector store backed by GrafeoDB embedded graph database
Project description
grafeo-langchain
LangChain graph store and vector store backed by GrafeoDB: an embedded graph database with native vector search.
No servers, no Docker, no configuration. Just uv add and go.
Install
uv add grafeo-langchain
# Optional: langchain-graph-retriever integration (requires >=0.8)
uv add "grafeo-langchain[retriever]"
Quick Start
Knowledge Graph (GraphStore)
Store LLM-extracted triples and query them with GQL/Cypher:
from langchain_openai import ChatOpenAI
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_core.documents import Document
from grafeo_langchain import GrafeoGraphStore
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
transformer = LLMGraphTransformer(llm=llm)
documents = [
Document(page_content="Alice works at Microsoft. Bob works at Google. Alice knows Bob."),
]
graph_documents = transformer.convert_to_graph_documents(documents)
store = GrafeoGraphStore(db_path="./knowledge.db")
store.add_graph_documents(graph_documents, include_source=True)
results = store.query("MATCH (p:Person)-[:WORKS_AT]->(c) RETURN p.node_id, c.node_id")
print(store.get_schema)
Vector + Graph Retrieval (GraphVectorStore)
Combine vector similarity search with graph traversal for Graph RAG:
from langchain_openai import OpenAIEmbeddings
from grafeo_langchain import GrafeoGraphVectorStore
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
store = GrafeoGraphVectorStore(
embedding=embeddings,
db_path="./doc_graph.db",
# embedding_dimensions auto-detected from the model
)
store.add_texts(
texts=["Python is a programming language...", "Guido van Rossum...", "ABC influenced..."],
metadatas=[
{"id": "python", "__graph_links__": [{"target_id": "abc", "type": "INFLUENCED_BY"}]},
{"id": "guido"},
{"id": "abc", "__graph_links__": [{"target_id": "python", "type": "INFLUENCED"}]},
],
ids=["python", "guido", "abc"],
)
# Standard vector search
docs = store.similarity_search("What programming languages exist?", k=2)
# Vector search + graph traversal
docs = store.traversal_search("What programming languages exist?", k=4, depth=2)
# MMR-diversified graph traversal
docs = store.mmr_traversal_search("programming history", k=4, depth=2, lambda_mult=0.7)
# Filtered search (only documents with matching metadata)
docs = store.similarity_search("languages", k=4, filter={"category": "systems"})
# Delete documents
store.delete(["python", "abc"])
Persistence
All data is stored in a single .db file when you pass db_path. Close the store, reopen it later, and your documents, embeddings, and graph links are all still there:
from langchain_openai import OpenAIEmbeddings
from grafeo_langchain import GrafeoGraphVectorStore
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Write phase
store = GrafeoGraphVectorStore(embedding=embeddings, db_path="./my_store.db")
store.add_texts(["Python is great", "Rust is fast"], ids=["py", "rs"])
store.close()
# Later: reopen and query
store = GrafeoGraphVectorStore(embedding=embeddings, db_path="./my_store.db")
docs = store.similarity_search("programming languages", k=2)
store.close()
Omit db_path (or pass None) for a purely in-memory store that is discarded when the process exits.
Graph Retriever Integration
Note: The
[retriever]extra is required for this feature. Install withuv add "grafeo-langchain[retriever]"(requireslangchain-graph-retriever>=0.8).
Use GrafeoAdapter with langchain-graph-retriever
for advanced traversal strategies (Eager, BFS, MMR) via metadata edges:
from grafeo_langchain import GrafeoGraphVectorStore
from grafeo_langchain.adapter import GrafeoAdapter
from langchain_graph_retriever import GraphRetriever
store = GrafeoGraphVectorStore(embedding=embeddings)
store.add_texts(
texts=["Python is a language", "Rust is a language"],
metadatas=[{"topic": "python"}, {"topic": "rust"}],
ids=["py", "rs"],
)
adapter = GrafeoAdapter(vector_store=store)
retriever = GraphRetriever(store=adapter, edges=[("topic", "topic")])
docs = retriever.invoke("programming")
Filters
All filter parameters use exact-match equality. Pass a dict where each key is a metadata field name and the value is the expected value. Only documents whose metadata matches every key-value pair are returned:
docs = store.similarity_search("query", k=4, filter={"category": "science", "year": 2024})
Supported value types: str, int, float, bool. Compound types (lists, dicts) are not supported as filter values.
Graph Links Format
Graph links between documents are specified via the __graph_links__ metadata key. Each link is a dict with the following fields:
| Field | Type | Required | Description |
|---|---|---|---|
target_id |
str |
Yes | The id of the target document |
type |
str |
No | Edge label (defaults to LINKS_TO) |
properties |
dict |
No | Additional properties stored on the edge |
Example:
store.add_texts(
texts=["Source document", "Target document"],
metadatas=[
{
"__graph_links__": [
{"target_id": "target", "type": "CITES"},
{"target_id": "other", "type": "RELATES_TO", "properties": {"weight": 0.9}},
]
},
{},
],
ids=["source", "target"],
)
The __graph_links__ key is consumed during ingestion and is not stored as document metadata.
Why Grafeo?
| Feature | Neo4j | Grafeo |
|---|---|---|
| Requires server | Yes (Docker/Cloud) | No (embedded, pip install) |
| GraphStore | Yes | Yes |
| GraphVectorStore | Community package | Built-in (native HNSW) |
| Query language | Cypher | GQL + Cypher + Gremlin |
| Graph algorithms | GDS plugin ($$$) | Built-in (PageRank, Louvain, ...) |
| Deployment | Docker container | Single .db file |
| Offline/edge | No | Yes |
API Reference
GrafeoGraphStore
GrafeoGraphStore(db_path=None): in-memory or persistent graph store.add_graph_documents(docs, include_source=False): ingest LLM-extracted graph documents.query(query, params=None): execute GQL/Cypher queries.get_schema/.get_structured_schema: inspect the graph schema.refresh_schema(): refresh the cached schema.client: access the underlyingGrafeoDBinstance
GrafeoGraphVectorStore
GrafeoGraphVectorStore(embedding, db_path=None, embedding_dimensions=None): vector store with graph links (dimensions auto-detected from the model).add_texts(texts, metadatas=None, ids=None): add documents with embeddings and optional graph links.similarity_search(query, k=4, filter=None): standard vector similarity search.similarity_search_by_vector(embedding, k=4, filter=None): search by pre-computed vector.traversal_search(query, k=4, depth=1, filter=None): vector search + graph traversal.mmr_traversal_search(query, k=4, depth=2, fetch_k=100, lambda_mult=0.5, filter=None): MMR-diversified traversal.delete(ids): remove documents by ID.from_texts(...)/.from_documents(...): factory methods
GrafeoAdapter
Requires uv add grafeo-langchain[retriever].
GrafeoAdapter(vector_store): adapter forlangchain-graph-retriever- Works with
GraphRetriever(store=adapter, edges=[...])for Eager/BFS strategies
Requirements
- Python 3.12+
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file grafeo_langchain-0.2.0.tar.gz.
File metadata
- Download URL: grafeo_langchain-0.2.0.tar.gz
- Upload date:
- Size: 87.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e52abb0e9190379baac93fe30e43de08d8590a204b6b8d881c3a4f56576c679
|
|
| MD5 |
e4207c927ce9ab709cff592810f7b7e3
|
|
| BLAKE2b-256 |
da149cd92effc6bf9570e14c299e8429d7e05856a16e6205da0c0792694621fe
|
Provenance
The following attestation bundles were made for grafeo_langchain-0.2.0.tar.gz:
Publisher:
pypi.yml on GrafeoDB/grafeo-langchain
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
grafeo_langchain-0.2.0.tar.gz -
Subject digest:
0e52abb0e9190379baac93fe30e43de08d8590a204b6b8d881c3a4f56576c679 - Sigstore transparency entry: 1281097434
- Sigstore integration time:
-
Permalink:
GrafeoDB/grafeo-langchain@7a51a5ff57a40dd49188f4f6f059c7e1d85d5b35 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/GrafeoDB
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@7a51a5ff57a40dd49188f4f6f059c7e1d85d5b35 -
Trigger Event:
release
-
Statement type:
File details
Details for the file grafeo_langchain-0.2.0-py3-none-any.whl.
File metadata
- Download URL: grafeo_langchain-0.2.0-py3-none-any.whl
- Upload date:
- Size: 16.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5167f9dca3a6cd6f46b088cc68399629ba6186623512a2375946a13e4d6104d4
|
|
| MD5 |
ccc33606cea1f6c7d673336abde0ffb0
|
|
| BLAKE2b-256 |
24a5b5bd1a9ae814d1fbda78d39469ad4c760a1b7c8d2c5633fbc2a915031c22
|
Provenance
The following attestation bundles were made for grafeo_langchain-0.2.0-py3-none-any.whl:
Publisher:
pypi.yml on GrafeoDB/grafeo-langchain
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
grafeo_langchain-0.2.0-py3-none-any.whl -
Subject digest:
5167f9dca3a6cd6f46b088cc68399629ba6186623512a2375946a13e4d6104d4 - Sigstore transparency entry: 1281097472
- Sigstore integration time:
-
Permalink:
GrafeoDB/grafeo-langchain@7a51a5ff57a40dd49188f4f6f059c7e1d85d5b35 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/GrafeoDB
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@7a51a5ff57a40dd49188f4f6f059c7e1d85d5b35 -
Trigger Event:
release
-
Statement type: