openGauss vector store integration for LangChain

These details have not been verified by PyPI

Project links

Project description

openGauss Vector Store for LangChain

openGauss integration for LangChain providing scalable vector storage and search capabilities, powered by openGauss.

Features

🚀 Multi-Index Support - HNSW and IVFFLAT vector indexing algorithms
📐 Multiple Distance Metrics - EUCLIDEAN/COSINE/MANHATTAN/NEGATIVE_INNER_PRODUCT
🔧 Auto-Schema Management - Automatic table creation and validation
🧮 Dimension Validation - Type-safe dimension constraints for different vector types
🛡️ ACID Compliance - Transaction-safe operations with connection pooling
🔀 Hybrid Search - Combine vector similarity with metadata filtering
😀 openGauss age Graph Support - Graph store implementation for openGauss age

Installation

pip install langchain-opengauss

Prerequisites:

openGauss >= 7.0.0
Python 3.8+
psycopg2-binary

Quick Start

1. Start openGauss Container

docker run --name opengauss \
  --privileged=true \
  -d \
  -e GS_PASSWORD=MyStrongPass@123 \
  -p 8888:5432 \
  opengauss/opengauss-server:latest

2. Basic Usage

from langchain_opengauss import OpenGauss, OpenGaussSettings
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings

# Configuration with validation
config = OpenGaussSettings(
    table_name="research_papers",
    embedding_dimension=1536,
    index_type="HNSW",
    distance_strategy="COSINE",
)

# Initialize with OpenAI embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vector_store = OpenGauss(embedding=embeddings, config=config)

# Insert documents
docs = [
    Document(page_content="Quantum computing basics", metadata={"field": "physics"}),
    Document(page_content="Neural network architectures", metadata={"field": "ai"})
]
vector_store.add_documents(docs)

# Semantic search
results = vector_store.similarity_search("deep learning models", k=1)
print(f"Found {len(results)} relevant documents")

Configuration Guide

Connection Settings

Parameter	Default	Description
`host`	localhost	Database server address
`port`	8888	Database connection port
`user`	gaussdb	Database username
`password`	-	Complex password string
`database`	postgres	Default database name
`min_connections`	1	Connection pool minimum size
`max_connections`	5	Connection pool maximum size
`table_name`	langchain_docs	Name of the table for storing vector data and metadata
`index_type`	IndexType.HNSW	Vector index algorithm type. Options: HNSW or IVFFLAT\nDefault is HNSW.
`vector_type`	VectorType.vector	Type of vector representation to use. Default is Vector.
`distance_strategy`	DistanceStrategy.COSINE	Vector similarity metric to use for retrieval. Options: euclidean (L2 distance), cosine (angular distance, ideal for text embeddings), manhattan (L1 distance for sparse data), negative_inner_product (dot product for normalized vectors).\n Default is cosine.
`embedding_dimension`	1536	Dimensionality of the vector embeddings.

Vector Configuration

class OpenGaussSettings(BaseModel):
    index_type: IndexType = IndexType.HNSW  # HNSW or IVFFLAT
    vector_type: VectorType = VectorType.vector  # Currently supports float vectors
    distance_strategy: DistanceStrategy = DistanceStrategy.COSINE
    embedding_dimension: int = 1536  # Max 2000 for vector type

Supported Combinations

Vector Type	Dimensions	Index Types	Supported Distance Strategies
vector	≤2000	HNSW/IVFFLAT	COSINE/EUCLIDEAN/MANHATTAN/INNER_PROD

Advanced Usage

Hybrid Search with Metadata

# Filter by metadata with vector search
results = vector_store.similarity_search(
    query="machine learning",
    k=3,
    filter={"publish_year": 2023, "category": "research"},
)

Index Management

# Create optimized HNSW index
vector_store.create_hnsw_index(
    m=24,  # Number of bi-directional links
    ef_construction=128,  # Search scope during build
    ef=64,  # Search scope during queries
)

API Reference

Core Methods

Method	Description
`add_documents`	Insert documents with automatic embedding
`similarity_search`	Basic vector similarity search
`similarity_search_with_score`	Return (document, similarity_score) tuples
`delete`	Remove documents by ID list
`drop_table`	Delete entire collection

Performance Tips

1. Index Tuning

HNSW Index Optimization

m (max connections per layer)
- Default: 16
- Range: 2~100
- Tradeoff: Higher values improve recall but increase index build time and memory usage
ef_construction (construction search scope)
- Default: 64
- Range: 4~1000 (must ≥ 2*m)

# Example HNSW configuration
vector_store.create_hnsw_index(
    m=16,  # Balance between recall and performance
    ef_construction=64,  # Ensure >2*m (48) and >ef_search
)

IVFFLAT Index Optimization

lists

Calculation:

# Recommended formula
lists = min(int(math.sqrt(total_rows)) if total_rows > 1e6 else int(total_rows / 1000),
     2000,  # openGauss maximum
)

Adjustment Guide:
- Start with 1000 lists for 1M vectors
- 2000 lists for 10M+ vectors
- Monitor recall rate and adjust

2. Connection Pooling

OpenGaussSettings(
 min_connections=3,
 max_connections=20
)

Limitations

Vector type bit and sparsevec currently under development

3. Start with openGaussAGEGraph

3.1. Create extension age in openGauss

#Enter docker container
docker exec -it opengauss bash

#Switch to omm user
su omm

#Connect to the database, and the OMM database is used by default
gsql -r

#Create the age plug-in on the OMM database
create extension age;

#Exit database connecting
\q

3.2. Basic Usage

from langchain_core.documents import Document
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_opengauss import openGaussAGEGraph, OpenGaussSettings
from langchain_community.llms import Tongyi
from langchain_core.prompts import PromptTemplate
from langchain.chains import GraphCypherQAChain
from langchain_core.output_parsers import StrOutputParser
import os

#set api-key
os.environ["DASHSCOPE_API_KEY"] = "sk-**"
graph_llm =Tongyi(model="qwen-plus", temperature=0, base_url="https://dashscope.aliyuncs.com/compatible-mode/v1")

llm_transformer = LLMGraphTransformer(
    llm=graph_llm,
    allowed_nodes=["Person", "Organization", "Location", "Award", "ResearchField"],
    allowed_relationships = ["SPOUSE", "AWARD", "FIELD_OF_RESEARCH", "WORKS_AT", "IN_LOCATION"],
)

text = """
Marie Curie, 7 November 1867 – 4 July 1934, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.
She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.
Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.
She was, in 1906, the first woman to become a professor at the University of Paris.
"""

documents = [Document(page_content=text)]
graph_documents = llm_transformer.convert_to_graph_documents(documents)

conf = OpenGaussSettings{
    database = "omm",				#Default database name
    user = "gaussdb",				#Database username
    password = "YourPassoword",	    #Password with complexity requirements
    host = "Your IP",				#Database server address
    port = 8888					#Database server port
}
graph=openGaussAGEGraph(graph_name='graphtest',conf=conf,create=True)
graph.add_graph_documents(graph_documents)
graph.refresh_schema()

cypher_prompt = PromptTemplate(
    template="""You are an expert in generating AGE Cypher queries.Use the following schema to generate a Cypher query to answer the given question.Do not include name, properties, or cypher.
    Schema:{schema}
    Question: {question}
    Cypher Query:""",
    input_variables=["schema", "question"],
)

chain = GraphCypherQAChain.from_llm(
    graph_llm, graph=graph, verbose=True, allow_dangerous_requests=True, cypher_validation=True, return_intermediate_steps=True,cypher_prompt=cypher_prompt
)

question = "Who get Nobel Prize ?"
result = chain.invoke({"query": question})

prompt = PromptTemplate(
    template="""You are an assistant for question-answering tasks. 
    Use the following pieces of retrieved context from a graph database to answer the question. If you don't know the answer, just say that you don't know. 
    Use two sentences maximum and keep the answer concise:
    Question: {question} 
    Graph Context: {graph_context}
    Answer: 
    """,
    input_variables=["question", "graph_context"],
)

composite_chain = prompt | graph_llm |StrOutputParser()

answer = composite_chain.invoke(
    {"question": question, "graph_context": result}
)
print(answer)

3.3 API Reference

Core Methods

Method	Description
`__init__(graph_name, conf, create)`	Create object of openGaussAGEGraph
`_wrap_query(query: str, graph_name: str)`	Convert a Cyper query to an openGauss Age compatible Sql Query.
`add_graph_documents(graph_documents, include_source)`	insert a list of graph documents into the graph
`refresh_schema()`	Refresh the graph schema information by updating the available labels, relationships, and properties

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.5

Nov 21, 2025

0.1.4

Jun 19, 2025

0.1.3

Apr 10, 2025

0.1.2

Apr 9, 2025

0.1.1

Apr 8, 2025

0.1.0

Apr 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_opengauss-0.1.5.tar.gz (18.5 kB view details)

Uploaded Nov 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

langchain_opengauss-0.1.5-py3-none-any.whl (16.9 kB view details)

Uploaded Nov 21, 2025 Python 3

File details

Details for the file langchain_opengauss-0.1.5.tar.gz.

File metadata

Download URL: langchain_opengauss-0.1.5.tar.gz
Upload date: Nov 21, 2025
Size: 18.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.7.4

File hashes

Hashes for langchain_opengauss-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`d90cc615fe7543d61c6530a2af52d752f7cd8f5708549c4d0a1132b2b050839c`
MD5	`d30f3588d0dcabb798033258d051d213`
BLAKE2b-256	`c6f86b0e3a46ca34cb063613162dd6145b8d3edcec6de075e205b9888929369d`

See more details on using hashes here.

File details

Details for the file langchain_opengauss-0.1.5-py3-none-any.whl.

File metadata

Download URL: langchain_opengauss-0.1.5-py3-none-any.whl
Upload date: Nov 21, 2025
Size: 16.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.7.4

File hashes

Hashes for langchain_opengauss-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f06ae3cd542601abb9622d54c091ccc7542081666c36935bcda274c0d6a6482f`
MD5	`fad15c5ee407a3e043f523f47bbf70f3`
BLAKE2b-256	`5affe7ef0b6d4d40de809cd3223b5e610f80e845a3511c03ee4999696e5f124e`

See more details on using hashes here.

langchain-opengauss 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

openGauss Vector Store for LangChain

Features

Installation

Quick Start

1. Start openGauss Container

2. Basic Usage

Configuration Guide

Connection Settings

Vector Configuration

Supported Combinations

Advanced Usage

Hybrid Search with Metadata

Index Management

API Reference

Core Methods

Performance Tips

1. Index Tuning

HNSW Index Optimization

IVFFLAT Index Optimization

2. Connection Pooling

Limitations

3. Start with openGaussAGEGraph

3.1. Create extension age in openGauss

3.2. Basic Usage

3.3 API Reference

Core Methods

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes