An integration package connecting OceanBase and LangChain
Project description
langchain-oceanbase
This package contains the LangChain integration with OceanBase. Current version: 0.4.0
OceanBase Database is a distributed relational database. It is developed entirely by Ant Group. The OceanBase Database is built on a common server cluster. Based on the Paxos protocol and its distributed structure, the OceanBase Database provides high availability and linear scalability.
OceanBase currently has the ability to store vectors. Users can easily perform the following operations with SQL:
- Create a table containing vector type fields;
- Create a vector index table based on the HNSW algorithm;
- Perform vector approximate nearest neighbor queries;
- ...
What's New in 0.4.0
- LangGraph checkpointing is now a primary workflow:
OceanBaseCheckpointSaveris the recommended way to persist graph state, resume threads, and support time-travel in LangGraph applications. - Storage support is explicit by backend: OceanBase, SeekDB, embedded SeekDB, and MySQL now have clearer capability boundaries in CI and documentation.
- SeekDB coverage is broader: server-backed SeekDB and embedded SeekDB are both covered for the supported vector and history scenarios.
LangChain Integration
OceanbaseVectorStore is the official LangChain integration for OceanBase.
Support for ChatMessageHistory is provided as an additional integration and is not part of the official VectorStore API.
Official documentation: https://python.langchain.com/docs/integrations/vectorstores/oceanbase/
0.4.0 Support Matrix
| Backend | LangGraph checkpoint | Vector store | Chat message history | Hybrid search | Notes |
|---|---|---|---|---|---|
| OceanBase | Yes | Yes | Yes | Yes | Best fit when you want the full SQL + vector database workflow. |
| SeekDB (server) | Yes | Yes | Yes | Yes | Full-featured SeekDB deployment, including the current AI function test coverage in CI. |
| Embedded SeekDB | Yes | Yes | Yes | Yes | Local path-based runtime through pyseekdb / pylibseekdb; no server deployment required. |
| MySQL | Yes | No | No | No | Checkpoint-focused backend only; vector and search features are out of scope. |
Recommended by Use Case
- LangGraph state persistence: use OceanBase, SeekDB, embedded SeekDB, or MySQL depending on your operational requirements.
- Vector store and retrieval workflows: use OceanBase, SeekDB server, or embedded SeekDB.
- Hybrid retrieval with dense + sparse + full-text search: use OceanBase, SeekDB server, or embedded SeekDB.
- Simple checkpoint-only deployments: MySQL remains supported for checkpoint storage, but not vector features.
Features
- LangGraph Checkpointing: Persist LangGraph conversation checkpoints with
OceanBaseCheckpointSaver, including resume, replay, and time-travel workflows for multi-thread graph state. See Migration Guide and examples/langgraph_agent.py. - Vector Storage: Store embeddings from LangChain models in OceanBase, SeekDB, or embedded SeekDB with automatic table creation and index management.
- Built-in Embedding: Built-in embedding function using
all-MiniLM-L6-v2model (384 dimensions) with no API keys required. Perfect for quick prototyping and local development.- No API Keys Required: Uses local ONNX models, no external API calls needed
- Quick Start: Perfect for rapid prototyping and testing
- LangChain Compatible: Fully compatible with LangChain's
Embeddingsinterface - Batch Processing: Supports efficient batch embedding generation
- Automatic Integration: Can be automatically used in
OceanbaseVectorStoreby settingembedding_function=None - Technical Specs: Model
all-MiniLM-L6-v2, 384 dimensions, ONNX Runtime inference
- Embedded SeekDB (optional): Run local embedded SeekDB through pyobvector (
path=orpyseekdb_client=onOceanbaseVectorStore) without OceanBase; requirespyobvector[pyseekdb]or a recentpyseekdbthat installspylibseekdb. See docs/vectorstores.md#embedded-seekdb-optional and examples/embedded_seekdb_vectorstore.py. - Similarity Search: Perform efficient similarity searches on vector data with multiple distance metrics (L2, cosine, inner product).
- Hybrid Search: Combine vector search with sparse vector search and full-text search for improved results with configurable weights.
- Maximal Marginal Relevance: Filter for diversity in search results to avoid redundant information.
- Multiple Index Types: Support for HNSW, IVF, FLAT and other vector index types with automatic parameter optimization.
- Sparse Embeddings: Native support for sparse vector embeddings with BM25-like functionality.
- Advanced Filtering: Built-in support for metadata filtering and complex query conditions.
- Async Support: Full support for async operations and high-concurrency scenarios.
- Custom Exceptions:
OceanBaseError,OceanBaseConnectionError,OceanBaseVectorDimensionError,OceanBaseIndexError,OceanBaseVersionError,OceanBaseConfigurationErrorwith troubleshooting links in messages.
Installation
pip install -U langchain-oceanbase
Requirements
- Python >=3.11
- langchain-core >=1.0.0
- pyobvector >=0.2.0 (required for database client)
- pyseekdb >=0.1.0 (required dependency; use >=1.2 on supported platforms for embedded SeekDB and the
pylibseekdbruntime)
Tip: The current version (0.4.0) supports
langchain-core >=1.0.0. See CHANGELOG.md for version history.
Platform Support
- ✅ Linux: Full support (x86_64 and ARM64)
- ✅ macOS/Windows: Supported -
pyobvectorworks on all platforms
Built-in Embedding Dependencies
For built-in embedding functionality (no API keys required), pyseekdb is automatically installed as an optional dependency. It provides:
- Local ONNX-based embedding inference
- Default embedding model:
all-MiniLM-L6-v2(384 dimensions) - No external API calls needed
We recommend using Docker to deploy OceanBase:
docker run --name=oceanbase -e MODE=mini -e OB_SERVER_IP=127.0.0.1 -p 2881:2881 -d oceanbase/oceanbase-ce:latest
For AI Functions support, use OceanBase 4.4.1 or later:
docker run --name=oceanbase -e MODE=mini -e OB_SERVER_IP=127.0.0.1 -p 2881:2881 -d oceanbase/oceanbase-ce:4.4.1.0-100000032025101610
More methods to deploy OceanBase cluster
Usage
Documentation Formats
Choose your preferred format:
- Jupyter Notebook - Interactive notebook with executable code cells
- Markdown - Static documentation for easy reading (includes embedded SeekDB)
- Embedded SeekDB example - Runnable script using local SeekDB without Docker
Additional Resources
- Built-in Embedding Guide - Interactive notebook for built-in embedding functionality
- Built-in Embedding Guide (Markdown) - Static documentation for built-in embeddings
- Hybrid Search Guide - Interactive notebook for hybrid search features
- Hybrid Search Guide (Markdown) - Static documentation for hybrid search
- AI Functions Guide - Documentation for AI Functions (AI_EMBED, AI_COMPLETE, AI_RERANK)
- AI Functions Guide (Notebook) - Interactive notebook for AI Functions
- Migration Guide - Migrating to LangGraph Checkpointer and schema changes
Built-in Embedding Sections:
- Installation - Install required packages
- Direct Use - Use DefaultEmbeddingFunction directly
- LangChain Compatible - Use DefaultEmbeddingFunctionAdapter
- Vector Store Integration - Use in OceanbaseVectorStore
- Text Similarity - Compute similarity between texts
- Performance - Batch vs single processing comparison
Hybrid Search Sections:
- Setup - Deploy OceanBase and install packages
- Vector Search - Semantic similarity matching
- Sparse Vector Search - Keyword-based exact matching
- Full-text Search - Content-based text search
- Multi-modal Search - Combined search strategies
AI Functions Sections:
- Setup - Deploy OceanBase and configure AI models
- Initialization - Configure and create AI functions client
- AI_EMBED - Convert text to vector embeddings
- AI_COMPLETE - Generate text completions
- AI_RERANK - Rerank search results
- Model Configuration API - Setup AI models and endpoints
Quick Start
Using Built-in Embedding (No API Keys Required)
The simplest way to get started is using the built-in embedding function, which requires no API keys. Prerequisite: OceanBase must be running (e.g. docker run --name=oceanbase -e MODE=mini -e OB_SERVER_IP=127.0.0.1 -p 2881:2881 -d oceanbase/oceanbase-ce:latest).
from langchain_oceanbase.vectorstores import OceanbaseVectorStore
from langchain_core.documents import Document
# Connection configuration
connection_args = {
"host": "127.0.0.1",
"port": "2881",
"user": "root@test",
"password": "",
"db_name": "test",
}
# Use default embedding (set embedding_function=None)
vector_store = OceanbaseVectorStore(
embedding_function=None, # Automatically uses DefaultEmbeddingFunction
table_name="langchain_vector",
connection_args=connection_args,
vidx_metric_type="l2",
drop_old=True,
embedding_dim=384, # all-MiniLM-L6-v2 dimension
)
# Add documents
documents = [
Document(page_content="Machine learning is a subset of artificial intelligence"),
Document(page_content="Python is a popular programming language"),
Document(page_content="OceanBase is a distributed relational database"),
]
ids = vector_store.add_documents(documents)
# Perform similarity search
results = vector_store.similarity_search("artificial intelligence", k=2)
for doc in results:
print(f"* {doc.page_content}")
You can verify this example without OceanBase (imports and constructor only) by running: poetry run python tests/run_readme_quickstart.py.
Key Benefits of Built-in Embedding:
- ✅ No API keys or external services required
- ✅ Works offline with local ONNX models
- ✅ Fast batch processing
- ✅ Perfect for prototyping and testing
- ✅ Model files (~80MB) downloaded automatically on first use
Additional Quick Start Guides
- Setup - Deploy OceanBase and install dependencies
- Initialization - Configure and create vector store
- Manage vector store - Add, update, and delete vectors
- Query vector store - Search and retrieve vectors
- Build RAG(Retrieval Augmented Generation) - Build powerful RAG applications
- Full-text Search - Implement full-text search capabilities
- Hybrid Search - Combine vector and text search for better results
- Advanced Filtering - Metadata filtering and complex query conditions
- Maximal Marginal Relevance - Filter for diversity in search results
- Multiple Index Types - Different vector index types (HNSW, IVF, FLAT)
Troubleshooting
Connection Refused
Error: Can't connect to MySQL server on 'localhost' or ConnectionRefusedError
Cause: OceanBase is not running or not accessible on the specified host/port.
Solution:
- Check if OceanBase is running:
docker ps | grep oceanbase
- Start OceanBase if not running:
docker start oceanbase
- Verify the port is correct (default: 2881 for local, 3306 for cloud)
- Check firewall settings if connecting to remote server
Vector Dimension Mismatch
Error: Vector dimension mismatch or OceanBaseVectorDimensionError
Cause: The embedding model's output dimension doesn't match the table's vector dimension.
Solution:
- Check your embedding model's output dimension (e.g.,
all-MiniLM-L6-v2outputs 384 dimensions) - Set the correct
embedding_dimparameter when initializingOceanbaseVectorStore - If the embedding model changed, recreate the table with
drop_old=True:vector_store = OceanbaseVectorStore( embedding_function=new_embedding, embedding_dim=new_dim, drop_old=True, # Recreate table with new dimension ... )
Index Creation Failed
Error: Failed to create index or OceanBaseIndexError
Cause: Insufficient memory, incompatible OceanBase version, or invalid index parameters.
Solution:
- Check available memory on your OceanBase server
- Verify OceanBase version supports the index type:
- HNSW: OceanBase 4.3.0+
- IVF variants: OceanBase 4.3.0+
- Try a simpler index type for small datasets:
vector_store = OceanbaseVectorStore( index_type="FLAT", # No index, exact search ... )
- For HNSW, reduce
Mparameter if memory is limited:vector_store = OceanbaseVectorStore( index_type="HNSW", vidx_algo_params={"M": 8, "efConstruction": 100}, ... )
AI Functions Not Supported
Error: AI functions are not supported or OceanBaseVersionError
Cause: OceanBase version is older than 4.4.1, which is required for AI functions.
Solution:
- Upgrade to OceanBase 4.4.1 or later:
docker run --name=oceanbase -e MODE=mini -e OB_SERVER_IP=127.0.0.1 \ -p 2881:2881 -d oceanbase/oceanbase-ce:4.4.1.0-100000032025101610
- Alternatively, use SeekDB which also supports AI functions
- Check current version:
SELECT version();
Slow Queries
Cause: Missing vector index, wrong index type, or suboptimal search parameters.
Solution:
- Ensure a vector index is created (check with
SHOW INDEX FROM table_name) - Use appropriate index type:
- HNSW: Best for large datasets with high recall requirements
- IVF_FLAT: Good balance of speed and accuracy
- FLAT: Best accuracy but slowest (no index)
- Tune search parameters for HNSW:
# Higher efSearch = better accuracy but slower vector_store.hnsw_ef_search = 128 # Default is 64
- For IVF indexes, adjust
nprobeparameter
Sparse Vector / Full-text Search Not Working
Error: Sparse vector support not enabled or Full-text search support not enabled
Cause: The vector store was not initialized with sparse/fulltext support.
Solution:
# Enable sparse vector support
vector_store = OceanbaseVectorStore(
include_sparse=True,
...
)
# Enable both sparse and full-text search
vector_store = OceanbaseVectorStore(
include_sparse=True,
include_fulltext=True,
...
)
Note: Full-text search requires include_sparse=True to be set as well.
Import Errors
Error: ModuleNotFoundError: No module named 'pyobvector'
Cause: Required dependencies are not installed.
Solution:
pip install -U langchain-oceanbase pyobvector
For AI functions support:
pip install -U langchain-oceanbase pyobvector langgraph-checkpoint
Quickstart
A short quickstart to run the local dev environment and example scripts.
Prerequisites:
- Git
- Docker & Docker Compose
- Python 3.10+
- (Optional) OpenAI API key for embeddings / LLM examples
- Clone the repo
git clone https://github.com/oceanbase/langchain-oceanbase.git
cd langchain-oceanbase
- Start the local database
# start OceanBase
make docker-up
# or start SeekDB (lightweight alternative)
make docker-up-seek
- Set environment variables (create a
.envfile or export them)
OB_HOST=127.0.0.1
OB_PORT=3306
OB_USER=root
OB_PASSWORD=changeme
OB_DB=langchain_ob_demo
OPENAI_API_KEY=sk-...
- Install example dependencies (examples use these packages)
pip install openai mysql-connector-python numpy
- Run an example
python examples/quickstart.py
python examples/rag_demo.py
python examples/hybrid_search_demo.py
Files of interest
docker-compose.yml— OceanBase CE service for local developmentdocker-compose.seekdb.yml— SeekDB lightweight alternativeMakefile— convenience targets:make docker-up,make docker-down,make docker-logs, plus format/lint/typecheck/test helpersCONTRIBUTING.md— developer setup, running tests, code style, PR processexamples/—quickstart.py,rag_demo.py,hybrid_search_demo.py, andexamples/README.md
Running tests and linters
- Unit tests (no database required):
make test
# or: poetry run pytest tests/unit_tests/
- Integration tests (requires OceanBase/SeekDB, e.g.
make docker-up):
make docker-up
make integration_tests
# or: poetry run pytest tests/integration_tests/
- Lint / formatting:
make format # code formatting (ruff format + import sort)
make lint # ruff check + mypy
Contributing
See CONTRIBUTING.md for detailed developer setup and the PR process. When submitting a PR, please:
- Target
developfor regular work (feature/*,bugfix/*,chore/*,docs/*,refactor/*,test/*) - Use
release/*orhotfix/*as the normal PR sources intomain - Dependabot version updates now target
develop - Dependabot security updates still follow the GitHub default branch until a repo admin switches the default branch from
maintodevelop - Reference the issue (e.g.,
Closes #43) in the PR body - Run linters and tests locally
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_oceanbase-0.4.0.tar.gz.
File metadata
- Download URL: langchain_oceanbase-0.4.0.tar.gz
- Upload date:
- Size: 55.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81597ea953f42c045281f3fc0cdbff36e395eb0acf634445d6751a86296d263e
|
|
| MD5 |
3442c05ac61f95b990f19f37c6cd9bf7
|
|
| BLAKE2b-256 |
1d47736248b202306abc038d4ad06d638b5de06349e7fdf64b9b60f74808fe3f
|
Provenance
The following attestation bundles were made for langchain_oceanbase-0.4.0.tar.gz:
Publisher:
python-publish.yml on oceanbase/langchain-oceanbase
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langchain_oceanbase-0.4.0.tar.gz -
Subject digest:
81597ea953f42c045281f3fc0cdbff36e395eb0acf634445d6751a86296d263e - Sigstore transparency entry: 1523968689
- Sigstore integration time:
-
Permalink:
oceanbase/langchain-oceanbase@35f35a148b7c9783b91ed1b9d567820f136d94cc -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/oceanbase
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@35f35a148b7c9783b91ed1b9d567820f136d94cc -
Trigger Event:
release
-
Statement type:
File details
Details for the file langchain_oceanbase-0.4.0-py3-none-any.whl.
File metadata
- Download URL: langchain_oceanbase-0.4.0-py3-none-any.whl
- Upload date:
- Size: 55.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6dac9938066568757f62b0e25cc3bc778b091e18fe2a7cc445bd75b55291282c
|
|
| MD5 |
4115ec2c048fc36f619dd429e384aa91
|
|
| BLAKE2b-256 |
ce5dfc7d2f05427747bd878c480b15c04bb8053f40f7b25bde698fda713c4af4
|
Provenance
The following attestation bundles were made for langchain_oceanbase-0.4.0-py3-none-any.whl:
Publisher:
python-publish.yml on oceanbase/langchain-oceanbase
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langchain_oceanbase-0.4.0-py3-none-any.whl -
Subject digest:
6dac9938066568757f62b0e25cc3bc778b091e18fe2a7cc445bd75b55291282c - Sigstore transparency entry: 1523968711
- Sigstore integration time:
-
Permalink:
oceanbase/langchain-oceanbase@35f35a148b7c9783b91ed1b9d567820f136d94cc -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/oceanbase
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@35f35a148b7c9783b91ed1b9d567820f136d94cc -
Trigger Event:
release
-
Statement type: