Haystack 2.x integration for OceanBase vector search (via pyobvector).
Project description
oceanbase-haystack
Haystack 2.x integration for OceanBase vector search. It mirrors the API surface of milvus-haystack while using pyobvector (ObVecClient) for SQLAlchemy-style access to OceanBase VECTOR indexes.
Requirements
- Python 3.9+
- A running OceanBase instance with vector support (see OceanBase documentation for version and vector features).
- Dependencies:
haystack-ai,pyobvector,sqlalchemy(declared inpyproject.toml).
Installation
pip install oceanbase-haystack
Or from a checkout (in this monorepo, use the oceanbase-haystack directory):
cd oceanbase-haystack
pip install -e .
In the ecology-plugins monorepo
This project lives under ecology-plugins as the directory oceanbase-haystack/. The top-level README lists all bundled plugins and links back here.
| Topic | Where |
|---|---|
| CI (lint, mocked tests, build) | .github/workflows/workflow.yml — job Test and Build OceanBase Haystack |
| Extended CI (Python matrix, optional OceanBase CE smoke) | .github/workflows/oceanbase-haystack-ci.yml — runs when oceanbase-haystack/** changes |
Quick start
from haystack import Document
from oceanbase_haystack import OceanBaseDocumentStore, OceanBaseEmbeddingRetriever
store = OceanBaseDocumentStore(
collection_name="HaystackCollection", # table name in OceanBase
connection_args={
"host": "127.0.0.1",
"port": "2881",
"user": "root@test",
"password": "",
"db_name": "test",
},
index_params={"metric_type": "L2", "index_type": "HNSW", "params": {}},
drop_old=True,
)
store.write_documents(
[
Document(content="hello", embedding=[0.1] * 128, meta={"source": "demo"}),
]
)
retriever = OceanBaseEmbeddingRetriever(document_store=store, top_k=5)
result = retriever.run(query_embedding=[0.1] * 128)
print(result["documents"])
Connection arguments
Unlike Milvus URI-style connection_args, this integration expects a flat dict:
| Key | Description |
|---|---|
host |
OceanBase host |
port |
Port (often 2881) |
user |
Username (e.g. root@tenant) |
password |
Password |
db_name |
Database name |
Use Haystack Secret for passwords in production and serialize with to_dict / from_dict as usual.
Document layout
- Table name:
collection_namemaps to a MySQL/OceanBase table name (same parameter name as milvus-haystack for familiarity). - Columns: Primary key (
id), dense vector (vector), text (text), and JSON metadata (meta). Optionalsparse_vector_fieldadds a sparse vector column when enabled. - Filters: Haystack nested filters are translated to SQL predicates on the JSON
metacolumn (e.g.meta.type→JSON_EXTRACT(meta, '$.type')).
Components
| Class | Role |
|---|---|
OceanBaseDocumentStore |
Implements Haystack document store operations: write, delete, count, filter, dense/sparse/hybrid retrieval. |
OceanBaseEmbeddingRetriever |
Dense vector retrieval (same idea as MilvusEmbeddingRetriever). |
OceanBaseSparseEmbeddingRetriever |
Sparse vector ANN search when sparse_vector_field is configured. |
OceanBaseHybridRetriever |
Combines dense and sparse results with Reciprocal Rank Fusion (RRF) (not pymilvus RRFRanker). |
Differences from milvus-haystack
- No Milvus built-in BM25:
builtin_functionis not supported; use model-producedSparseEmbeddingfor sparse search. - Hybrid fusion: Uses RRF in the client layer instead of Milvus hybrid +
RRFRanker. - Metadata: Stored in a single JSON column rather than Milvus dynamic scalar fields.
Configuration tips
embedding_dim: If set at construction time and the table does not exist, the store can create the table before the first write.index_params: Milvus-style keys (metric_type,index_type,params) are mapped to OceanBase vector index settings where possible.search_params: e.g.{"efSearch": 64}for HNSW search-time behavior (passed through to pyobvector where applicable).
Development
From the oceanbase-haystack directory:
pip install -e ".[dev]"
Lint and format (matches CI):
python -m ruff check src tests
python -m ruff format --check src tests
Tests:
-
Default / CI (no live OceanBase): runs unit and mocked tests only:
python -m pytest tests -v -m "not oceanbase"
-
Full suite (includes tests marked
oceanbase; requires a reachable OceanBase instance and env vars as intests/test_oceanbase_integration.py):export OCEANBASE_CI=1 # and set OB_HOST, OB_PORT, OB_USER, OB_PASSWORD, OB_DB, etc. python -m pytest tests -v
Build source and wheel:
python -m build
You can also use make from this directory: make install, make check, make test-ci, make build (see Makefile).
Publishing to PyPI (maintainers)
Workflow publish-oceanbase-haystack-pypi.yml follows the same pattern as publish-pyobsql-pypi.yml (twine + PYPI_API_TOKEN / TEST_PYPI_API_TOKEN).
- Manual run: Actions → Publish OceanBase Haystack to PyPI → optional version (updates
__about__.py), Test PyPI toggle. - Tag push: push
release_oceanbase_haystack_*after setting the desired__version__in__about__.py.
See the monorepo README for secret setup.
License
Apache-2.0 (see pyproject.toml).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file oceanbase_haystack-0.1.0.tar.gz.
File metadata
- Download URL: oceanbase_haystack-0.1.0.tar.gz
- Upload date:
- Size: 11.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0d145d964a6958af61dfae16d9b75b5fafcb984bda62c7bf2518d5f74dbc6d0
|
|
| MD5 |
d25429d0a47cc184817c3c6d6262dc05
|
|
| BLAKE2b-256 |
696dd966becd5e13420f5e464e445e2e12edde60f13642de270c31565e035dfd
|
File details
Details for the file oceanbase_haystack-0.1.0-py3-none-any.whl.
File metadata
- Download URL: oceanbase_haystack-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f93dba8fffc27393286499db26210be458e8b83f59630ba52cf1fbf91bbb485e
|
|
| MD5 |
eb7b2f55f8924ff6ecbda365aa9f15a9
|
|
| BLAKE2b-256 |
208c9ea62b257deb9dbae88472a248243cf37b709276f6309e290d505609d1d8
|