OpenSearch client with hybrid search support for Korean text

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

liniar

These details have not been verified by PyPI

Project description

OpenSearch Client

OpenSearch client with hybrid search support for Korean text.

Features

Text Search: Multi-match queries with Korean (Nori) analyzer
Semantic Search: Vector embeddings with k-NN search
Hybrid Search: Combined text + vector search with Search Pipeline (OpenSearch 2.10+)
VectorStore: Simple high-level API for vector storage and retrieval
Async Support: Full async/await support with AsyncOpenSearchClient

Prerequisites

This is a client library for OpenSearch. You need a running OpenSearch server to use this package.

Architecture

┌─────────────────────────────────────────────────────────────┐
│  Your Application                                           │
│  ┌───────────────────────────────────────────────────────┐  │
│  │  from opensearch_client import OpenSearchClient       │  │
│  │  client = OpenSearchClient(host="...", port=9200)     │  │
│  │  client.search(...)                                   │  │
│  └───────────────────────────────────────────────────────┘  │
│                            │                                │
│                   opensearch-client (this package)          │
└────────────────────────────┼────────────────────────────────┘
                             │ HTTP/HTTPS
                             ▼
┌─────────────────────────────────────────────────────────────┐
│  OpenSearch Server (separate process)                       │
│  - Docker container (local development)                     │
│  - AWS OpenSearch Service (production)                      │
│  - Self-hosted cluster                                      │
└─────────────────────────────────────────────────────────────┘

Running OpenSearch Locally

# Using Docker (recommended for development with Korean support)
docker compose -f docker-compose.dev.yml up -d

# Or simple Docker run (no Nori plugin)
docker run -d -p 9200:9200 \
  -e "discovery.type=single-node" \
  -e "plugins.security.disabled=true" \
  opensearchproject/opensearch:latest

Cloud Options

AWS OpenSearch Service: Managed OpenSearch in AWS
Self-hosted cluster: Deploy on your own infrastructure

For detailed setup instructions including production deployment and environment management, see Server Setup Guide.

Installation

# Basic installation
uv add opensearch-client

# With OpenAI embeddings
uv add opensearch-client[openai]

# With local embeddings (FastEmbed)
uv add opensearch-client[local]

# With async support
uv add opensearch-client[async]

# All features
uv add opensearch-client[all]

Quick Start

from opensearch_client import OpenSearchClient

# Initialize client
client = OpenSearchClient(
    host="localhost",
    port=9200,
    user="admin",
    password="admin"
)

# Check connection
print(client.ping())

Usage Examples

1. Text Search

from opensearch_client import OpenSearchClient, TextQueryBuilder, IndexManager

client = OpenSearchClient(host="localhost", port=9200, use_ssl=False)

# Create text index with Korean analyzer
body = IndexManager.create_text_index_body(
    text_field="content",
    use_korean_analyzer=True
)
client.create_index("my-docs", body)

# Index documents
client.bulk_index("my-docs", [
    {"title": "OpenSearch", "content": "OpenSearch는 검색 엔진입니다."},
    {"title": "Python", "content": "Python은 프로그래밍 언어입니다."},
])
client.refresh("my-docs")

# Multi-match search
query = TextQueryBuilder.multi_match(
    query="검색 엔진",
    fields=["title", "content"],
    boost_map={"title": 2.0, "content": 1.0}
)
body = TextQueryBuilder.build_search_body(query, size=10)
results = client.search("my-docs", body)

2. Semantic Search (k-NN)

from opensearch_client import OpenSearchClient, IndexManager
from opensearch_client.semantic_search.knn_search import KNNSearch
from opensearch_client.semantic_search.embeddings import FastEmbedEmbedding

# Initialize embedder
embedder = FastEmbedEmbedding(model_name="BAAI/bge-small-en-v1.5")

# Create vector index
body = IndexManager.create_vector_index_body(
    vector_field="embedding",
    vector_dimension=embedder.dimension
)
client.create_index("semantic-docs", body)

# Index with embeddings
text = "OpenSearch is a search engine"
client.index_document("semantic-docs", {
    "text": text,
    "embedding": embedder.embed(text)
})
client.refresh("semantic-docs")

# k-NN search
query_vector = embedder.embed("search engine")
query = KNNSearch.knn_query(
    field="embedding",
    vector=query_vector,
    k=10
)
body = KNNSearch.build_search_body(query, size=10)
results = client.search("semantic-docs", body)

3. Hybrid Search (Recommended)

from opensearch_client import OpenSearchClient, IndexManager, HybridQueryBuilder
from opensearch_client.semantic_search.embeddings import OpenAIEmbedding

# Initialize
client = OpenSearchClient(host="localhost", port=9200, use_ssl=False)
embedder = OpenAIEmbedding()  # Uses OPENAI_API_KEY env var

# Create hybrid index (text + vector)
body = IndexManager.create_hybrid_index_body(
    text_field="content",
    vector_field="embedding",
    vector_dimension=embedder.dimension,
    use_korean_analyzer=True
)
client.create_index("hybrid-docs", body)

# Setup Search Pipeline (required for hybrid search)
client.setup_hybrid_pipeline(
    pipeline_id="my-pipeline",
    text_weight=0.3,   # 30% text score
    vector_weight=0.7  # 70% vector score
)

# Index documents
text = "OpenSearch는 텍스트와 벡터 검색을 지원합니다."
client.index_document("hybrid-docs", {
    "content": text,
    "embedding": embedder.embed(text)
})
client.refresh("hybrid-docs")

# Hybrid search
search_text = "벡터 검색"
results = client.hybrid_search(
    index_name="hybrid-docs",
    query=search_text,
    query_vector=embedder.embed(search_text),
    pipeline="my-pipeline",
    text_fields=["content"],
    vector_field="embedding",
    k=10
)

4. VectorStore (Simplified API)

from opensearch_client import OpenSearchClient, VectorStore
from opensearch_client.semantic_search.embeddings import FastEmbedEmbedding

# Initialize
client = OpenSearchClient(host="localhost", port=9200, use_ssl=False)
embedder = FastEmbedEmbedding()  # or OpenAIEmbedding()

# Create store (auto-creates index and pipeline)
store = VectorStore("my-store", embedder, client)

# Add documents (auto-embeds text)
store.add([
    "OpenSearch는 검색 엔진입니다.",
    "Python은 프로그래밍 언어입니다.",
    "벡터 검색은 유사도 기반 검색입니다.",
])

# Add with metadata
store.add(
    ["FastEmbed는 빠른 임베딩 라이브러리입니다."],
    metadata=[{"category": "tech", "source": "docs"}]
)

# Search
results = store.search("검색 엔진이 뭐야?", k=3)
for r in results:
    print(f"{r.score:.3f}: {r.text}")

# Other operations
store.count()              # Get document count
store.delete(["doc-id"])   # Delete by ID
store.clear()              # Delete all documents

5. Async Client

import asyncio
from opensearch_client import AsyncOpenSearchClient

async def main():
    # Initialize async client
    async with AsyncOpenSearchClient(
        host="localhost",
        port=9200,
        use_ssl=False
    ) as client:
        # Check connection
        print(await client.ping())

        # Create index
        await client.create_index("async-docs", {
            "settings": {"index": {"knn": True}},
            "mappings": {"properties": {"text": {"type": "text"}}}
        })

        # Index documents
        await client.bulk_index("async-docs", [
            {"text": "First document"},
            {"text": "Second document"},
        ])
        await client.refresh("async-docs")

        # Search
        results = await client.search("async-docs", {
            "query": {"match": {"text": "document"}}
        })
        print(results["hits"]["hits"])

        # Hybrid search (requires pipeline setup)
        await client.setup_hybrid_pipeline(
            pipeline_id="async-pipeline",
            text_weight=0.3,
            vector_weight=0.7
        )

        results = await client.hybrid_search(
            index_name="async-docs",
            query="document",
            query_vector=[0.1] * 384,  # Your embedding here
            pipeline="async-pipeline",
            text_fields=["text"],
            vector_field="embedding"
        )

# Run
asyncio.run(main())

Note: Async support requires the async extra: uv add opensearch-client[async]

Development

Setup

# Clone repository
git clone https://github.com/namyoungkim/opensearch-client.git
cd opensearch-client

# Install dependencies (requires uv)
uv sync --all-extras

# Setup pre-commit hooks
uv run pre-commit install

Code Quality

# Lint check
uv run ruff check .

# Lint with auto-fix
uv run ruff check --fix .

# Format code
uv run ruff format .

# Type check
uv run ty check

# Run all checks (via pre-commit)
uv run pre-commit run --all-files

Testing

# Run unit tests
uv run pytest tests/unit -v

# Run integration tests (requires OpenSearch on port 9201)
docker compose -f docker-compose.test.yml up -d
uv run pytest tests/integration -v

# Run all tests with coverage (requires 70% minimum)
uv run pytest --cov=opensearch_client --cov-report=html

Note: Integration tests use port 9201 to avoid conflicts with production OpenSearch (default 9200).

Troubleshooting

Connection Issues

Port conflicts:

# Integration tests use port 9201, not 9200
# Override with environment variable if needed
OPENSEARCH_TEST_PORT=9201 uv run pytest tests/integration -v

SSL/TLS errors:

# Development only (not recommended for production)
client = OpenSearchClient(use_ssl=False, verify_certs=False)

# Production (recommended)
client = OpenSearchClient(
    use_ssl=True,
    verify_certs=True,
    ca_certs="/path/to/ca.pem"
)

Docker Issues

Container not starting:

# Check logs
docker compose -f docker-compose.test.yml logs

# Reset and restart
docker compose -f docker-compose.test.yml down -v
docker compose -f docker-compose.test.yml up -d

Memory errors:

# Increase Docker memory limit (recommended: 4GB+)
# Or adjust in docker-compose.test.yml:
# environment:
#   - "ES_JAVA_OPTS=-Xms512m -Xmx512m"

Performance Tuning

Vector Search (k-NN)

Parameter	Default	Description
`ef_search`	100	Higher = better accuracy, slower search
`ef_construction`	128	Higher = better index quality, slower build
`m`	16	Number of connections per node

# High accuracy configuration
body = IndexManager.create_vector_index_body(
    vector_dimension=384,
    ef_construction=256,
    m=32
)
client.create_index("high-accuracy-index", body)

Hybrid Search Weights

Use Case	Text Weight	Vector Weight
Keyword-focused	0.7	0.3
Semantic-focused	0.3	0.7
Balanced	0.5	0.5

client.setup_hybrid_pipeline(
    pipeline_id="balanced-pipeline",
    text_weight=0.5,
    vector_weight=0.5
)

Batch Operations

# Efficient bulk embedding and indexing
embeddings = embedder.embed_batch(texts)  # Batch embedding
client.bulk_index("my-index", documents)   # Bulk indexing

Tech Stack

Category	Choice	Version
Package Manager	uv	latest
Linter/Formatter	ruff	0.14+
Type Checker	ty	0.0.7+
OpenSearch	OpenSearch	3.1.0
Korean Analyzer	Nori	3.3.0
Python Client	opensearch-py	3.1.0
Embeddings (Local)	FastEmbed	0.4+
Embeddings (API)	OpenAI	1.0+
Search Method	Hybrid Search	-

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

liniar

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.1

Jan 1, 2026

0.2.0

Jan 1, 2026

This version

0.1.0

Jan 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opensearch_client-0.1.0.tar.gz (224.8 kB view details)

Uploaded Jan 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

opensearch_client-0.1.0-py3-none-any.whl (31.2 kB view details)

Uploaded Jan 1, 2026 Python 3

File details

Details for the file opensearch_client-0.1.0.tar.gz.

File metadata

Download URL: opensearch_client-0.1.0.tar.gz
Upload date: Jan 1, 2026
Size: 224.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for opensearch_client-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`789f813ddcab444f8bad5f17bda6a0124ef57e96dbd4631f338894496504685d`
MD5	`7ad7776a476acd9e372d57bab615b0dc`
BLAKE2b-256	`9a05afb5dbd0eb7b4cf9c271fdb08376c0f5e9bef4d73b1552d70275eb5da1ec`

See more details on using hashes here.

Provenance

The following attestation bundles were made for opensearch_client-0.1.0.tar.gz:

Publisher: publish.yml on namyoungkim/opensearch-client

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: opensearch_client-0.1.0.tar.gz
- Subject digest: 789f813ddcab444f8bad5f17bda6a0124ef57e96dbd4631f338894496504685d
- Sigstore transparency entry: 787044972
- Sigstore integration time: Jan 1, 2026
Source repository:
- Permalink: namyoungkim/opensearch-client@15ed87e3ebae88689d525ac6681e773cdc644763
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/namyoungkim
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@15ed87e3ebae88689d525ac6681e773cdc644763
- Trigger Event: push

File details

Details for the file opensearch_client-0.1.0-py3-none-any.whl.

File metadata

Download URL: opensearch_client-0.1.0-py3-none-any.whl
Upload date: Jan 1, 2026
Size: 31.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for opensearch_client-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9507d9422b303dcd058685494d3805691f59917abf807bcdbd4244d761b7d8ef`
MD5	`b2b23cbb70532a792ff046dc81312e48`
BLAKE2b-256	`628cd310e32119c08fda6ddef37f8320fdf688ccc6bf4d95cbbcabacaeb40024`

See more details on using hashes here.

Provenance

The following attestation bundles were made for opensearch_client-0.1.0-py3-none-any.whl:

Publisher: publish.yml on namyoungkim/opensearch-client

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: opensearch_client-0.1.0-py3-none-any.whl
- Subject digest: 9507d9422b303dcd058685494d3805691f59917abf807bcdbd4244d761b7d8ef
- Sigstore transparency entry: 787044974
- Sigstore integration time: Jan 1, 2026
Source repository:
- Permalink: namyoungkim/opensearch-client@15ed87e3ebae88689d525ac6681e773cdc644763
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/namyoungkim
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@15ed87e3ebae88689d525ac6681e773cdc644763
- Trigger Event: push

opensearch-client 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

OpenSearch Client

Features

Prerequisites

Architecture

Running OpenSearch Locally

Cloud Options

Installation

Quick Start

Usage Examples

1. Text Search

2. Semantic Search (k-NN)

3. Hybrid Search (Recommended)

4. VectorStore (Simplified API)

5. Async Client

Development

Setup

Code Quality

Testing

Troubleshooting

Connection Issues

Docker Issues

Performance Tuning

Vector Search (k-NN)

Hybrid Search Weights

Batch Operations

Tech Stack

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance