Skip to main content

LangChain graph store and vector store backed by GrafeoDB embedded graph database

Project description

CI codecov PyPI License

grafeo-langchain

LangChain graph store and vector store backed by GrafeoDB: an embedded graph database with native vector search.

No servers, no Docker, no configuration. Just uv add and go.

Install

uv add grafeo-langchain

# Optional: langchain-graph-retriever integration (requires >=0.8)
uv add "grafeo-langchain[retriever]"

Quick Start

Knowledge Graph (GraphStore)

Store LLM-extracted triples and query them with GQL/Cypher:

from langchain_openai import ChatOpenAI
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_core.documents import Document
from grafeo_langchain import GrafeoGraphStore

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
transformer = LLMGraphTransformer(llm=llm)

documents = [
    Document(page_content="Alice works at Microsoft. Bob works at Google. Alice knows Bob."),
]
graph_documents = transformer.convert_to_graph_documents(documents)

store = GrafeoGraphStore(db_path="./knowledge.db")
store.add_graph_documents(graph_documents, include_source=True)

results = store.query("MATCH (p:Person)-[:WORKS_AT]->(c) RETURN p.node_id, c.node_id")
print(store.get_schema)

Vector + Graph Retrieval (GraphVectorStore)

Combine vector similarity search with graph traversal for Graph RAG:

from langchain_openai import OpenAIEmbeddings
from grafeo_langchain import GrafeoGraphVectorStore

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
store = GrafeoGraphVectorStore(
    embedding=embeddings,
    db_path="./doc_graph.db",
    # embedding_dimensions auto-detected from the model
)

store.add_texts(
    texts=["Python is a programming language...", "Guido van Rossum...", "ABC influenced..."],
    metadatas=[
        {"id": "python", "__graph_links__": [{"target_id": "abc", "type": "INFLUENCED_BY"}]},
        {"id": "guido"},
        {"id": "abc", "__graph_links__": [{"target_id": "python", "type": "INFLUENCED"}]},
    ],
    ids=["python", "guido", "abc"],
)

# Standard vector search
docs = store.similarity_search("What programming languages exist?", k=2)

# Vector search + graph traversal
docs = store.traversal_search("What programming languages exist?", k=4, depth=2)

# MMR-diversified graph traversal
docs = store.mmr_traversal_search("programming history", k=4, depth=2, lambda_mult=0.7)

# Filtered search (only documents with matching metadata)
docs = store.similarity_search("languages", k=4, filter={"category": "systems"})

# Delete documents
store.delete(["python", "abc"])

Persistence

All data is stored in a single .db file when you pass db_path. Close the store, reopen it later, and your documents, embeddings, and graph links are all still there:

from langchain_openai import OpenAIEmbeddings
from grafeo_langchain import GrafeoGraphVectorStore

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Write phase
store = GrafeoGraphVectorStore(embedding=embeddings, db_path="./my_store.db")
store.add_texts(["Python is great", "Rust is fast"], ids=["py", "rs"])
store.close()

# Later: reopen and query
store = GrafeoGraphVectorStore(embedding=embeddings, db_path="./my_store.db")
docs = store.similarity_search("programming languages", k=2)
store.close()

Omit db_path (or pass None) for a purely in-memory store that is discarded when the process exits.

Graph Retriever Integration

Note: The [retriever] extra is required for this feature. Install with uv add "grafeo-langchain[retriever]" (requires langchain-graph-retriever>=0.8).

Use GrafeoAdapter with langchain-graph-retriever for advanced traversal strategies (Eager, BFS, MMR) via metadata edges:

from grafeo_langchain import GrafeoGraphVectorStore
from grafeo_langchain.adapter import GrafeoAdapter
from langchain_graph_retriever import GraphRetriever

store = GrafeoGraphVectorStore(embedding=embeddings)
store.add_texts(
    texts=["Python is a language", "Rust is a language"],
    metadatas=[{"topic": "python"}, {"topic": "rust"}],
    ids=["py", "rs"],
)

adapter = GrafeoAdapter(vector_store=store)
retriever = GraphRetriever(store=adapter, edges=[("topic", "topic")])
docs = retriever.invoke("programming")

Filters

All filter parameters use exact-match equality. Pass a dict where each key is a metadata field name and the value is the expected value. Only documents whose metadata matches every key-value pair are returned:

docs = store.similarity_search("query", k=4, filter={"category": "science", "year": 2024})

Supported value types: str, int, float, bool. Compound types (lists, dicts) are not supported as filter values.

Graph Links Format

Graph links between documents are specified via the __graph_links__ metadata key. Each link is a dict with the following fields:

Field Type Required Description
target_id str Yes The id of the target document
type str No Edge label (defaults to LINKS_TO)
properties dict No Additional properties stored on the edge

Example:

store.add_texts(
    texts=["Source document", "Target document"],
    metadatas=[
        {
            "__graph_links__": [
                {"target_id": "target", "type": "CITES"},
                {"target_id": "other", "type": "RELATES_TO", "properties": {"weight": 0.9}},
            ]
        },
        {},
    ],
    ids=["source", "target"],
)

The __graph_links__ key is consumed during ingestion and is not stored as document metadata.

Why Grafeo?

Feature Neo4j Grafeo
Requires server Yes (Docker/Cloud) No (embedded, pip install)
GraphStore Yes Yes
GraphVectorStore Community package Built-in (native HNSW)
Query language Cypher GQL + Cypher + Gremlin
Graph algorithms GDS plugin ($$$) Built-in (PageRank, Louvain, ...)
Deployment Docker container Single .db file
Offline/edge No Yes

API Reference

GrafeoGraphStore

  • GrafeoGraphStore(db_path=None): in-memory or persistent graph store
  • .add_graph_documents(docs, include_source=False): ingest LLM-extracted graph documents
  • .query(query, params=None): execute GQL/Cypher queries
  • .get_schema / .get_structured_schema: inspect the graph schema
  • .refresh_schema(): refresh the cached schema
  • .client: access the underlying GrafeoDB instance

GrafeoGraphVectorStore

  • GrafeoGraphVectorStore(embedding, db_path=None, embedding_dimensions=None): vector store with graph links (dimensions auto-detected from the model)
  • .add_texts(texts, metadatas=None, ids=None): add documents with embeddings and optional graph links
  • .similarity_search(query, k=4, filter=None): standard vector similarity search
  • .similarity_search_by_vector(embedding, k=4, filter=None): search by pre-computed vector
  • .traversal_search(query, k=4, depth=1, filter=None): vector search + graph traversal
  • .mmr_traversal_search(query, k=4, depth=2, fetch_k=100, lambda_mult=0.5, filter=None): MMR-diversified traversal
  • .delete(ids): remove documents by ID
  • .from_texts(...) / .from_documents(...): factory methods

GrafeoAdapter

Requires uv add grafeo-langchain[retriever].

  • GrafeoAdapter(vector_store): adapter for langchain-graph-retriever
  • Works with GraphRetriever(store=adapter, edges=[...]) for Eager/BFS strategies

Requirements

  • Python 3.12+

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grafeo_langchain-0.2.0.tar.gz (87.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

grafeo_langchain-0.2.0-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file grafeo_langchain-0.2.0.tar.gz.

File metadata

  • Download URL: grafeo_langchain-0.2.0.tar.gz
  • Upload date:
  • Size: 87.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for grafeo_langchain-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0e52abb0e9190379baac93fe30e43de08d8590a204b6b8d881c3a4f56576c679
MD5 e4207c927ce9ab709cff592810f7b7e3
BLAKE2b-256 da149cd92effc6bf9570e14c299e8429d7e05856a16e6205da0c0792694621fe

See more details on using hashes here.

Provenance

The following attestation bundles were made for grafeo_langchain-0.2.0.tar.gz:

Publisher: pypi.yml on GrafeoDB/grafeo-langchain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file grafeo_langchain-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for grafeo_langchain-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5167f9dca3a6cd6f46b088cc68399629ba6186623512a2375946a13e4d6104d4
MD5 ccc33606cea1f6c7d673336abde0ffb0
BLAKE2b-256 24a5b5bd1a9ae814d1fbda78d39469ad4c760a1b7c8d2c5633fbc2a915031c22

See more details on using hashes here.

Provenance

The following attestation bundles were made for grafeo_langchain-0.2.0-py3-none-any.whl:

Publisher: pypi.yml on GrafeoDB/grafeo-langchain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page