OpenSearch client with hybrid search support for Korean text
Project description
OpenSearch Client
OpenSearch client with hybrid search support for Korean text.
Features
- Text Search: Multi-match queries with Korean (Nori) analyzer
- Semantic Search: Vector embeddings with k-NN search
- Hybrid Search: Combined text + vector search with Search Pipeline (OpenSearch 2.10+)
- VectorStore: Simple high-level API for vector storage and retrieval
- Async Support: Full async/await support with
AsyncOpenSearchClient
Prerequisites
This is a client library for OpenSearch. You need a running OpenSearch server to use this package.
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Your Application │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ from opensearch_client import OpenSearchClient │ │
│ │ client = OpenSearchClient(host="...", port=9200) │ │
│ │ client.search(...) │ │
│ └───────────────────────────────────────────────────────┘ │
│ │ │
│ opensearch-client (this package) │
└────────────────────────────┼────────────────────────────────┘
│ HTTP/HTTPS
▼
┌─────────────────────────────────────────────────────────────┐
│ OpenSearch Server (separate process) │
│ - Docker container (local development) │
│ - AWS OpenSearch Service (production) │
│ - Self-hosted cluster │
└─────────────────────────────────────────────────────────────┘
Running OpenSearch Locally
# Using Docker (recommended for development with Korean support)
docker compose -f docker-compose.dev.yml up -d
# Or simple Docker run (no Nori plugin)
docker run -d -p 9200:9200 \
-e "discovery.type=single-node" \
-e "plugins.security.disabled=true" \
opensearchproject/opensearch:latest
Cloud Options
- AWS OpenSearch Service: Managed OpenSearch in AWS
- Self-hosted cluster: Deploy on your own infrastructure
For detailed setup instructions including production deployment and environment management, see Server Setup Guide.
Installation
# Basic installation
uv add opensearch-client
# With OpenAI embeddings
uv add opensearch-client[openai]
# With local embeddings (FastEmbed)
uv add opensearch-client[local]
# With async support
uv add opensearch-client[async]
# All features
uv add opensearch-client[all]
Quick Start
from opensearch_client import OpenSearchClient
# Initialize client
client = OpenSearchClient(
host="localhost",
port=9200,
user="admin",
password="admin"
)
# Check connection
print(client.ping())
Usage Examples
1. Text Search
from opensearch_client import OpenSearchClient, TextQueryBuilder, IndexManager
client = OpenSearchClient(host="localhost", port=9200, use_ssl=False)
# Create text index with Korean analyzer
body = IndexManager.create_text_index_body(
text_field="content",
use_korean_analyzer=True
)
client.create_index("my-docs", body)
# Index documents
client.bulk_index("my-docs", [
{"title": "OpenSearch", "content": "OpenSearch는 검색 엔진입니다."},
{"title": "Python", "content": "Python은 프로그래밍 언어입니다."},
])
client.refresh("my-docs")
# Multi-match search
query = TextQueryBuilder.multi_match(
query="검색 엔진",
fields=["title", "content"],
boost_map={"title": 2.0, "content": 1.0}
)
body = TextQueryBuilder.build_search_body(query, size=10)
results = client.search("my-docs", body)
2. Semantic Search (k-NN)
from opensearch_client import OpenSearchClient, IndexManager
from opensearch_client.semantic_search.knn_search import KNNSearch
from opensearch_client.semantic_search.embeddings import FastEmbedEmbedding
# Initialize embedder
embedder = FastEmbedEmbedding(model_name="BAAI/bge-small-en-v1.5")
# Create vector index
body = IndexManager.create_vector_index_body(
vector_field="embedding",
vector_dimension=embedder.dimension
)
client.create_index("semantic-docs", body)
# Index with embeddings
text = "OpenSearch is a search engine"
client.index_document("semantic-docs", {
"text": text,
"embedding": embedder.embed(text)
})
client.refresh("semantic-docs")
# k-NN search
query_vector = embedder.embed("search engine")
query = KNNSearch.knn_query(
field="embedding",
vector=query_vector,
k=10
)
body = KNNSearch.build_search_body(query, size=10)
results = client.search("semantic-docs", body)
3. Hybrid Search (Recommended)
from opensearch_client import OpenSearchClient, IndexManager, HybridQueryBuilder
from opensearch_client.semantic_search.embeddings import OpenAIEmbedding
# Initialize
client = OpenSearchClient(host="localhost", port=9200, use_ssl=False)
embedder = OpenAIEmbedding() # Uses OPENAI_API_KEY env var
# Create hybrid index (text + vector)
body = IndexManager.create_hybrid_index_body(
text_field="content",
vector_field="embedding",
vector_dimension=embedder.dimension,
use_korean_analyzer=True
)
client.create_index("hybrid-docs", body)
# Setup Search Pipeline (required for hybrid search)
client.setup_hybrid_pipeline(
pipeline_id="my-pipeline",
text_weight=0.3, # 30% text score
vector_weight=0.7 # 70% vector score
)
# Index documents
text = "OpenSearch는 텍스트와 벡터 검색을 지원합니다."
client.index_document("hybrid-docs", {
"content": text,
"embedding": embedder.embed(text)
})
client.refresh("hybrid-docs")
# Hybrid search
search_text = "벡터 검색"
results = client.hybrid_search(
index_name="hybrid-docs",
query=search_text,
query_vector=embedder.embed(search_text),
pipeline="my-pipeline",
text_fields=["content"],
vector_field="embedding",
k=10
)
4. VectorStore (Simplified API)
from opensearch_client import OpenSearchClient, VectorStore
from opensearch_client.semantic_search.embeddings import FastEmbedEmbedding
# Initialize
client = OpenSearchClient(host="localhost", port=9200, use_ssl=False)
embedder = FastEmbedEmbedding() # or OpenAIEmbedding()
# Create store (auto-creates index and pipeline)
store = VectorStore("my-store", embedder, client)
# Add documents (auto-embeds text)
store.add([
"OpenSearch는 검색 엔진입니다.",
"Python은 프로그래밍 언어입니다.",
"벡터 검색은 유사도 기반 검색입니다.",
])
# Add with metadata
store.add(
["FastEmbed는 빠른 임베딩 라이브러리입니다."],
metadata=[{"category": "tech", "source": "docs"}]
)
# Search
results = store.search("검색 엔진이 뭐야?", k=3)
for r in results:
print(f"{r.score:.3f}: {r.text}")
# Other operations
store.count() # Get document count
store.delete(["doc-id"]) # Delete by ID
store.clear() # Delete all documents
5. Async Client
import asyncio
from opensearch_client import AsyncOpenSearchClient
async def main():
# Initialize async client
async with AsyncOpenSearchClient(
host="localhost",
port=9200,
use_ssl=False
) as client:
# Check connection
print(await client.ping())
# Create index
await client.create_index("async-docs", {
"settings": {"index": {"knn": True}},
"mappings": {"properties": {"text": {"type": "text"}}}
})
# Index documents
await client.bulk_index("async-docs", [
{"text": "First document"},
{"text": "Second document"},
])
await client.refresh("async-docs")
# Search
results = await client.search("async-docs", {
"query": {"match": {"text": "document"}}
})
print(results["hits"]["hits"])
# Hybrid search (requires pipeline setup)
await client.setup_hybrid_pipeline(
pipeline_id="async-pipeline",
text_weight=0.3,
vector_weight=0.7
)
results = await client.hybrid_search(
index_name="async-docs",
query="document",
query_vector=[0.1] * 384, # Your embedding here
pipeline="async-pipeline",
text_fields=["text"],
vector_field="embedding"
)
# Run
asyncio.run(main())
Note: Async support requires the async extra: uv add opensearch-client[async]
Development
Setup
# Clone repository
git clone https://github.com/namyoungkim/opensearch-client.git
cd opensearch-client
# Install dependencies (requires uv)
uv sync --all-extras
# Setup pre-commit hooks
uv run pre-commit install
Code Quality
# Lint check
uv run ruff check .
# Lint with auto-fix
uv run ruff check --fix .
# Format code
uv run ruff format .
# Type check
uv run ty check
# Run all checks (via pre-commit)
uv run pre-commit run --all-files
Testing
# Run unit tests
uv run pytest tests/unit -v
# Run integration tests (requires OpenSearch on port 9201)
docker compose -f docker-compose.test.yml up -d
uv run pytest tests/integration -v
# Run all tests with coverage (requires 70% minimum)
uv run pytest --cov=opensearch_client --cov-report=html
Note: Integration tests use port 9201 to avoid conflicts with production OpenSearch (default 9200).
Troubleshooting
Connection Issues
Port conflicts:
# Integration tests use port 9201, not 9200
# Override with environment variable if needed
OPENSEARCH_TEST_PORT=9201 uv run pytest tests/integration -v
SSL/TLS errors:
# Development only (not recommended for production)
client = OpenSearchClient(use_ssl=False, verify_certs=False)
# Production (recommended)
client = OpenSearchClient(
use_ssl=True,
verify_certs=True,
ca_certs="/path/to/ca.pem"
)
Docker Issues
Container not starting:
# Check logs
docker compose -f docker-compose.test.yml logs
# Reset and restart
docker compose -f docker-compose.test.yml down -v
docker compose -f docker-compose.test.yml up -d
Memory errors:
# Increase Docker memory limit (recommended: 4GB+)
# Or adjust in docker-compose.test.yml:
# environment:
# - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
Performance Tuning
Vector Search (k-NN)
| Parameter | Default | Description |
|---|---|---|
ef_search |
100 | Higher = better accuracy, slower search |
ef_construction |
128 | Higher = better index quality, slower build |
m |
16 | Number of connections per node |
# High accuracy configuration
body = IndexManager.create_vector_index_body(
vector_dimension=384,
ef_construction=256,
m=32
)
client.create_index("high-accuracy-index", body)
Hybrid Search Weights
| Use Case | Text Weight | Vector Weight |
|---|---|---|
| Keyword-focused | 0.7 | 0.3 |
| Semantic-focused | 0.3 | 0.7 |
| Balanced | 0.5 | 0.5 |
client.setup_hybrid_pipeline(
pipeline_id="balanced-pipeline",
text_weight=0.5,
vector_weight=0.5
)
Batch Operations
# Efficient bulk embedding and indexing
embeddings = embedder.embed_batch(texts) # Batch embedding
client.bulk_index("my-index", documents) # Bulk indexing
Tech Stack
| Category | Choice | Version |
|---|---|---|
| Package Manager | uv | latest |
| Linter/Formatter | ruff | 0.14+ |
| Type Checker | ty | 0.0.7+ |
| OpenSearch | OpenSearch | 3.1.0 |
| Korean Analyzer | Nori | 3.3.0 |
| Python Client | opensearch-py | 3.1.0 |
| Embeddings (Local) | FastEmbed | 0.4+ |
| Embeddings (API) | OpenAI | 1.0+ |
| Search Method | Hybrid Search | - |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file opensearch_client-0.1.0.tar.gz.
File metadata
- Download URL: opensearch_client-0.1.0.tar.gz
- Upload date:
- Size: 224.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
789f813ddcab444f8bad5f17bda6a0124ef57e96dbd4631f338894496504685d
|
|
| MD5 |
7ad7776a476acd9e372d57bab615b0dc
|
|
| BLAKE2b-256 |
9a05afb5dbd0eb7b4cf9c271fdb08376c0f5e9bef4d73b1552d70275eb5da1ec
|
Provenance
The following attestation bundles were made for opensearch_client-0.1.0.tar.gz:
Publisher:
publish.yml on namyoungkim/opensearch-client
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
opensearch_client-0.1.0.tar.gz -
Subject digest:
789f813ddcab444f8bad5f17bda6a0124ef57e96dbd4631f338894496504685d - Sigstore transparency entry: 787044972
- Sigstore integration time:
-
Permalink:
namyoungkim/opensearch-client@15ed87e3ebae88689d525ac6681e773cdc644763 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/namyoungkim
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@15ed87e3ebae88689d525ac6681e773cdc644763 -
Trigger Event:
push
-
Statement type:
File details
Details for the file opensearch_client-0.1.0-py3-none-any.whl.
File metadata
- Download URL: opensearch_client-0.1.0-py3-none-any.whl
- Upload date:
- Size: 31.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9507d9422b303dcd058685494d3805691f59917abf807bcdbd4244d761b7d8ef
|
|
| MD5 |
b2b23cbb70532a792ff046dc81312e48
|
|
| BLAKE2b-256 |
628cd310e32119c08fda6ddef37f8320fdf688ccc6bf4d95cbbcabacaeb40024
|
Provenance
The following attestation bundles were made for opensearch_client-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on namyoungkim/opensearch-client
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
opensearch_client-0.1.0-py3-none-any.whl -
Subject digest:
9507d9422b303dcd058685494d3805691f59917abf807bcdbd4244d761b7d8ef - Sigstore transparency entry: 787044974
- Sigstore integration time:
-
Permalink:
namyoungkim/opensearch-client@15ed87e3ebae88689d525ac6681e773cdc644763 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/namyoungkim
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@15ed87e3ebae88689d525ac6681e773cdc644763 -
Trigger Event:
push
-
Statement type: