Skip to main content

Haystack 2.x integration for OceanBase vector search (via pyobvector).

Project description

oceanbase-haystack

Haystack 2.x integration for OceanBase vector search. It mirrors the API surface of milvus-haystack while using pyobvector (ObVecClient) for SQLAlchemy-style access to OceanBase VECTOR indexes.

Requirements

  • Python 3.9+
  • A running OceanBase instance with vector support (see OceanBase documentation for version and vector features).
  • Dependencies: haystack-ai, pyobvector, sqlalchemy (declared in pyproject.toml).

Installation

pip install oceanbase-haystack

Or from a checkout (in this monorepo, use the oceanbase-haystack directory):

cd oceanbase-haystack
pip install -e .

In the ecology-plugins monorepo

This project lives under ecology-plugins as the directory oceanbase-haystack/. The top-level README lists all bundled plugins and links back here.

Topic Where
CI (lint, mocked tests, build) .github/workflows/workflow.yml — job Test and Build OceanBase Haystack
Extended CI (Python matrix, optional OceanBase CE smoke) .github/workflows/oceanbase-haystack-ci.yml — runs when oceanbase-haystack/** changes

Quick start

from haystack import Document
from oceanbase_haystack import OceanBaseDocumentStore, OceanBaseEmbeddingRetriever

store = OceanBaseDocumentStore(
    collection_name="HaystackCollection",  # table name in OceanBase
    connection_args={
        "host": "127.0.0.1",
        "port": "2881",
        "user": "root@test",
        "password": "",
        "db_name": "test",
    },
    index_params={"metric_type": "L2", "index_type": "HNSW", "params": {}},
    drop_old=True,
)

store.write_documents(
    [
        Document(content="hello", embedding=[0.1] * 128, meta={"source": "demo"}),
    ]
)

retriever = OceanBaseEmbeddingRetriever(document_store=store, top_k=5)
result = retriever.run(query_embedding=[0.1] * 128)
print(result["documents"])

Connection arguments

Unlike Milvus URI-style connection_args, this integration expects a flat dict:

Key Description
host OceanBase host
port Port (often 2881)
user Username (e.g. root@tenant)
password Password
db_name Database name

Use Haystack Secret for passwords in production and serialize with to_dict / from_dict as usual.

Document layout

  • Table name: collection_name maps to a MySQL/OceanBase table name (same parameter name as milvus-haystack for familiarity).
  • Columns: Primary key (id), dense vector (vector), text (text), and JSON metadata (meta). Optional sparse_vector_field adds a sparse vector column when enabled.
  • Filters: Haystack nested filters are translated to SQL predicates on the JSON meta column (e.g. meta.typeJSON_EXTRACT(meta, '$.type')).

Components

Class Role
OceanBaseDocumentStore Implements Haystack document store operations: write, delete, count, filter, dense/sparse/hybrid retrieval.
OceanBaseEmbeddingRetriever Dense vector retrieval (same idea as MilvusEmbeddingRetriever).
OceanBaseSparseEmbeddingRetriever Sparse vector ANN search when sparse_vector_field is configured.
OceanBaseHybridRetriever Combines dense and sparse results with Reciprocal Rank Fusion (RRF) (not pymilvus RRFRanker).

Differences from milvus-haystack

  • No Milvus built-in BM25: builtin_function is not supported; use model-produced SparseEmbedding for sparse search.
  • Hybrid fusion: Uses RRF in the client layer instead of Milvus hybrid + RRFRanker.
  • Metadata: Stored in a single JSON column rather than Milvus dynamic scalar fields.

Configuration tips

  • embedding_dim: If set at construction time and the table does not exist, the store can create the table before the first write.
  • index_params: Milvus-style keys (metric_type, index_type, params) are mapped to OceanBase vector index settings where possible.
  • search_params: e.g. {"efSearch": 64} for HNSW search-time behavior (passed through to pyobvector where applicable).

Development

From the oceanbase-haystack directory:

pip install -e ".[dev]"

Lint and format (matches CI):

python -m ruff check src tests
python -m ruff format --check src tests

Tests:

  • Default / CI (no live OceanBase): runs unit and mocked tests only:

    python -m pytest tests -v -m "not oceanbase"
    
  • Full suite (includes tests marked oceanbase; requires a reachable OceanBase instance and env vars as in tests/test_oceanbase_integration.py):

    export OCEANBASE_CI=1   # and set OB_HOST, OB_PORT, OB_USER, OB_PASSWORD, OB_DB, etc.
    python -m pytest tests -v
    

Build source and wheel:

python -m build

You can also use make from this directory: make install, make check, make test-ci, make build (see Makefile).

Publishing to PyPI (maintainers)

Workflow publish-oceanbase-haystack-pypi.yml follows the same pattern as publish-pyobsql-pypi.yml (twine + PYPI_API_TOKEN / TEST_PYPI_API_TOKEN).

  • Manual run: Actions → Publish OceanBase Haystack to PyPI → optional version (updates __about__.py), Test PyPI toggle.
  • Tag push: push release_oceanbase_haystack_* after setting the desired __version__ in __about__.py.

See the monorepo README for secret setup.

License

Apache-2.0 (see pyproject.toml).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oceanbase_haystack-0.1.0.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oceanbase_haystack-0.1.0-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file oceanbase_haystack-0.1.0.tar.gz.

File metadata

  • Download URL: oceanbase_haystack-0.1.0.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for oceanbase_haystack-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d0d145d964a6958af61dfae16d9b75b5fafcb984bda62c7bf2518d5f74dbc6d0
MD5 d25429d0a47cc184817c3c6d6262dc05
BLAKE2b-256 696dd966becd5e13420f5e464e445e2e12edde60f13642de270c31565e035dfd

See more details on using hashes here.

File details

Details for the file oceanbase_haystack-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for oceanbase_haystack-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f93dba8fffc27393286499db26210be458e8b83f59630ba52cf1fbf91bbb485e
MD5 eb7b2f55f8924ff6ecbda365aa9f15a9
BLAKE2b-256 208c9ea62b257deb9dbae88472a248243cf37b709276f6309e290d505609d1d8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page