Skip to main content

An integration package connecting OceanBase and LangChain

Project description

langchain-oceanbase

This package contains the LangChain integration with OceanBase.

OceanBase Database is a distributed relational database. It is developed entirely by Ant Group. The OceanBase Database is built on a common server cluster. Based on the Paxos protocol and its distributed structure, the OceanBase Database provides high availability and linear scalability.

OceanBase currently has the ability to store vectors. Users can easily perform the following operations with SQL:

  • Create a table containing vector type fields;
  • Create a vector index table based on the HNSW algorithm;
  • Perform vector approximate nearest neighbor queries;
  • ...

Features

  • Built-in Embedding: Built-in embedding function using all-MiniLM-L6-v2 model (384 dimensions) with no API keys required. Perfect for quick prototyping and local development.
    • No API Keys Required: Uses local ONNX models, no external API calls needed
    • Quick Start: Perfect for rapid prototyping and testing
    • LangChain Compatible: Fully compatible with LangChain's Embeddings interface
    • Batch Processing: Supports efficient batch embedding generation
    • Automatic Integration: Can be automatically used in OceanbaseVectorStore by setting embedding_function=None
    • Technical Specs: Model all-MiniLM-L6-v2, 384 dimensions, ONNX Runtime inference
  • Vector Storage: Store embeddings from any LangChain embedding model in OceanBase with automatic table creation and index management.
  • Similarity Search: Perform efficient similarity searches on vector data with multiple distance metrics (L2, cosine, inner product).
  • Hybrid Search: Combine vector search with sparse vector search and full-text search for improved results with configurable weights.
  • Maximal Marginal Relevance: Filter for diversity in search results to avoid redundant information.
  • Multiple Index Types: Support for HNSW, IVF, FLAT and other vector index types with automatic parameter optimization.
  • Sparse Embeddings: Native support for sparse vector embeddings with BM25-like functionality.
  • Advanced Filtering: Built-in support for metadata filtering and complex query conditions.
  • Async Support: Full support for async operations and high-concurrency scenarios.

Installation

pip install -U langchain-oceanbase

Requirements

  • Python >=3.11
  • langchain-core >=1.0.0
  • pyobvector >=0.2.0 (required for database client)
  • pyseekdb >=0.1.0 (optional, for built-in embedding functionality)

Tip: The current version supports langchain-core >=1.0.0

Platform Support

  • Linux: Full support (x86_64 and ARM64)
  • macOS/Windows: Supported - pyobvector works on all platforms

Built-in Embedding Dependencies

For built-in embedding functionality (no API keys required), pyseekdb is automatically installed as an optional dependency. It provides:

  • Local ONNX-based embedding inference
  • Default embedding model: all-MiniLM-L6-v2 (384 dimensions)
  • No external API calls needed

We recommend using Docker to deploy OceanBase:

docker run --name=oceanbase -e MODE=mini -e OB_SERVER_IP=127.0.0.1 -p 2881:2881 -d oceanbase/oceanbase-ce:latest

For AI Functions support, use OceanBase 4.4.1 or later:

docker run --name=oceanbase -e MODE=mini -e OB_SERVER_IP=127.0.0.1 -p 2881:2881 -d oceanbase/oceanbase-ce:4.4.1.0-100000032025101610

More methods to deploy OceanBase cluster

Usage

Documentation Formats

Choose your preferred format:

Additional Resources

Built-in Embedding Sections:

Hybrid Search Sections:

AI Functions Sections:

Quick Start

Using Built-in Embedding (No API Keys Required)

The simplest way to get started is using the built-in embedding function, which requires no API keys:

from langchain_oceanbase.vectorstores import OceanbaseVectorStore
from langchain_core.documents import Document

# Connection configuration
connection_args = {
    "host": "127.0.0.1",
    "port": "2881",
    "user": "root@test",
    "password": "",
    "db_name": "test",
}

# Use default embedding (set embedding_function=None)
vector_store = OceanbaseVectorStore(
    embedding_function=None,  # Automatically uses DefaultEmbeddingFunction
    table_name="langchain_vector",
    connection_args=connection_args,
    vidx_metric_type="l2",
    drop_old=True,
    embedding_dim=384,  # all-MiniLM-L6-v2 dimension
)

# Add documents
documents = [
    Document(page_content="Machine learning is a subset of artificial intelligence"),
    Document(page_content="Python is a popular programming language"),
    Document(page_content="OceanBase is a distributed relational database"),
]
ids = vector_store.add_documents(documents)

# Perform similarity search
results = vector_store.similarity_search("artificial intelligence", k=2)
for doc in results:
    print(f"* {doc.page_content}")

Key Benefits of Built-in Embedding:

  • ✅ No API keys or external services required
  • ✅ Works offline with local ONNX models
  • ✅ Fast batch processing
  • ✅ Perfect for prototyping and testing
  • ✅ Model files (~80MB) downloaded automatically on first use

Additional Quick Start Guides

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_oceanbase-0.3.2.tar.gz (33.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_oceanbase-0.3.2-py3-none-any.whl (34.8 kB view details)

Uploaded Python 3

File details

Details for the file langchain_oceanbase-0.3.2.tar.gz.

File metadata

  • Download URL: langchain_oceanbase-0.3.2.tar.gz
  • Upload date:
  • Size: 33.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for langchain_oceanbase-0.3.2.tar.gz
Algorithm Hash digest
SHA256 fec46d4e84b570d808d9121c070bb3b91c3e9e1e8bae4bc0aa428363fc2dc6ae
MD5 dad49064357af019b2b94bd7de98922b
BLAKE2b-256 e984c4d91ec5a3696680013407501a44667a2628c41d2166f502d4cec04f3086

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_oceanbase-0.3.2.tar.gz:

Publisher: python-publish.yml on oceanbase/langchain-oceanbase

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file langchain_oceanbase-0.3.2-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_oceanbase-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2f706ed7c4b266c32cee170de201402ce513b15852e20e5eeab5381cec416069
MD5 31a0c2538c2cd76420d4e9e90c15693f
BLAKE2b-256 6bf048bfb82b7402cedb5a685670bc343086ddf8d5e8e57241b844791d759505

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_oceanbase-0.3.2-py3-none-any.whl:

Publisher: python-publish.yml on oceanbase/langchain-oceanbase

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page