Skip to main content

LangChain VectorStore integration for Seahorse API Gateway

Project description

LangChain Seahorse VectorStore

Python Version License

LangChain VectorStore integration for Seahorse API Gateway - A high-performance vector database for semantic search and RAG applications.

Features

  • LangChain Compatible: Full implementation of LangChain VectorStore interface
  • Schema-Aware Column Resolution: Dense and sparse vector columns are auto-resolved from GET /v2/data/schema
  • Hybrid Search: Dense, Sparse, and Hybrid (RRF) search modes
  • Dual Embedding Support: Use Seahorse's built-in embeddings or bring your own (OpenAI, Cohere, etc.)
  • Metadata Filtering: Filter search results by metadata
  • Batch Processing: Efficient handling of large datasets (auto-batched; max 50 rows/request, max 32KB text/row)
  • Indexing & Health Monitoring: get_indexed_row_count() returns a typed IndexedRowCount model for tracking index build progress; health() provides a drop-in liveness probe
  • Type-Safe: Complete type hints for Python 3.8+
  • Well-Tested: Comprehensive unit and integration tests

Installation

# Using pip
pip install langchain-seahorse

# Using uv (recommended)
uv add langchain-seahorse

Quick Start

Basic Usage with Built-in Embeddings

from seahorse_vector_store import SeahorseVectorStore

# Initialize vectorstore
vectorstore = SeahorseVectorStore(
    api_key="your-seahorse-api-key",
    base_url="https://your-table-uuid.api.seahorse.dnotitia.ai",
)

# Add documents
ids = vectorstore.add_texts(
    texts=[
        "Machine learning is a subset of AI.",
        "Deep learning uses neural networks.",
    ],
    metadatas=[
        {"source": "doc1.pdf", "page": 1},
        {"source": "doc2.pdf", "page": 5},
    ]
)

# Search
docs = vectorstore.similarity_search(
    query="What is machine learning?",
    k=2
)

for doc in docs:
    print(doc.page_content)
    print(doc.metadata)

Using External Embeddings

from seahorse_vector_store import SeahorseVectorStore
from langchain_openai import OpenAIEmbeddings

vectorstore = SeahorseVectorStore(
    api_key="your-seahorse-api-key",
    base_url="https://your-table-uuid.api.seahorse.dnotitia.ai",
    embedding=OpenAIEmbeddings(api_key="your-openai-key"),
    use_builtin_embedding=False,
)

# Use as normal...

Hybrid Search (Dense + Sparse)

from seahorse_vector_store import SeahorseVectorStore, SearchMode

vectorstore = SeahorseVectorStore(
    api_key="your-api-key",
    base_url="https://your-table-uuid.api.seahorse.dnotitia.ai",
)

# Default: Hybrid search (Dense + Sparse with RRF fusion)
docs = vectorstore.similarity_search("machine learning", k=5)

# Pure Dense search
docs = vectorstore.similarity_search(
    "machine learning", k=5, retrieval_mode=SearchMode.DENSE
)

# Pure Sparse search (BM25-based)
docs = vectorstore.similarity_search(
    "machine learning", k=5, retrieval_mode=SearchMode.SPARSE
)

Metadata Filtering

# Search with metadata filter
docs = vectorstore.similarity_search(
    query="neural networks",
    k=5,
    filter={"source": "doc1.pdf", "page": 1}
)

Indexing Status & Health Check

# Per-index indexing progress (typed model). Top-level counts are
# writer-based; ``stats.readable`` adds a reader-node view (segment dedup
# + ``row_count - deleted_row_count`` saturating).
stats = vectorstore.get_indexed_row_count()
print(stats.total_row_count)
for idx in stats.indexed_counts:
    print(f"{idx.index_name} ({idx.index_type}): {idx.indexed_row_count}")

# Skip the reader-node ``readable`` view when only writer counts are needed
stats = vectorstore.get_indexed_row_count(readable=False)

# Lightweight liveness probe — True on 200 OK, False on any SeahorseAPIError
if not vectorstore.health():
    raise RuntimeError("Seahorse backend is unreachable")

🔧 Configuration

Environment Variables

You can set API credentials via environment variables:

export SEAHORSE_API_KEY="your-api-key"
export SEAHORSE_BASE_URL="https://your-table-uuid.api.seahorse.dnotitia.ai"

Then use them in your code:

import os
from seahorse_vector_store import SeahorseVectorStore

vectorstore = SeahorseVectorStore(
    api_key=os.environ["SEAHORSE_API_KEY"],
    base_url=os.environ["SEAHORSE_BASE_URL"],
)

Advanced Options

vectorstore = SeahorseVectorStore(
    api_key="your-api-key",
    base_url="https://your-table-uuid.api.seahorse.dnotitia.ai",
    use_builtin_embedding=True,  # Use Seahorse embeddings
    # dense_column / sparse_column are optional explicit overrides.
    # If omitted, the SDK resolves them from GET /v2/data/schema.
)

Primary Key Behavior

Seahorse uses mandatory content-hash primary keys.

  • add_texts() and from_texts() always return IDs generated from Seahorse PK rules.
  • Caller-provided custom IDs, including LangChain Document.id, are not persisted as the stored row ID.
  • Use the returned ids from insert operations as the source of truth for later delete workflows.

📖 API Reference

SeahorseVectorStore

Main class for interacting with Seahorse as a vector store.

Synchronous Methods

  • add_texts(texts, metadatas=None, **kwargs) - Add texts to the vector store
  • similarity_search(query, k=4, filter=None, **kwargs) - Search for similar documents
  • similarity_search_with_score(query, k=4, filter=None, **kwargs) - Search with distance scores
  • similarity_search_by_vector(embedding, k=4, filter=None, **kwargs) - Search by vector
  • similarity_search_by_vector_with_score(embedding, k=4, filter=None, **kwargs) - Search by vector with scores
  • delete(ids=None, **kwargs) - Delete documents by IDs
  • from_texts(texts, embedding=None, metadatas=None, **kwargs) - Create vectorstore from texts
  • get_indexed_row_count(readable=True) - Per-index indexed row counts as IndexedRowCount
  • health() - Lightweight liveness probe (returns bool)

Async Methods

  • aadd_texts(texts, metadatas=None, **kwargs) - Add texts asynchronously
  • asimilarity_search(query, k=4, filter=None, **kwargs) - Search asynchronously
  • asimilarity_search_with_score(query, k=4, filter=None, **kwargs) - Search with scores asynchronously
  • asimilarity_search_by_vector(embedding, k=4, filter=None, **kwargs) - Search by vector asynchronously
  • asimilarity_search_by_vector_with_score(embedding, k=4, filter=None, **kwargs) - Search by vector with scores asynchronously
  • adelete(ids=None, **kwargs) - Delete documents asynchronously
  • aget_indexed_row_count(readable=True) - Per-index indexed row counts (async)
  • ahealth() - Async liveness probe

Search Modes

  • SearchMode.HYBRID (default) - Dense + Sparse with RRF fusion
  • SearchMode.DENSE - Pure dense vector search
  • SearchMode.SPARSE - Pure sparse (BM25) search

Not Supported

  • max_marginal_relevance_search() - ⚠️ MMR search is not supported by Seahorse API

Testing

Setup for Integration Tests

Create a .env file in the project root with your Seahorse credentials:

# Copy the example file
cp .env.example .env

# Edit .env and add your credentials
SEAHORSE_API_KEY=your-api-key
SEAHORSE_BASE_URL=https://your-table-uuid.api.seahorse.dnotitia.ai

Running Tests

# Run unit tests
uv run pytest tests/unit/

# Run basic integration tests (requires .env file with API credentials)
uv run pytest tests/integration/ \
  --ignore=tests/integration/test_ollama_embeddings.py \
  --ignore=tests/integration/test_rag_pipeline.py

# Run all tests with coverage
uv run pytest --cov=seahorse_vector_store --cov-report=term-missing

# Skip integration tests
uv run pytest -m "not integration"

Running Ollama Integration Tests (Optional)

For advanced tests using Ollama LLM and embeddings:

# 1. Install Ollama dependencies (Python 3.9+ required)
uv pip install langchain langchain-ollama

# 2. Start Ollama server
ollama serve

# 3. Download models
ollama pull qwen3-embedding:8b  # For embeddings
ollama pull qwen3:8b             # For RAG

# 4. Run Ollama tests
uv run pytest tests/integration/test_ollama_embeddings.py -v
uv run pytest tests/integration/test_rag_pipeline.py -v

# 5. Run all integration tests (including Ollama)
uv run pytest tests/integration/ -v

Note: Ollama tests will automatically skip if Ollama is not available or required models are not installed.

Examples

See the examples/ directory for complete examples:

  • basic_usage.py - Basic vectorstore operations
  • async_usage.py - Async/await operations for better performance
  • rag_pipeline.py - Building a RAG (Retrieval-Augmented Generation) pipeline
  • metadata_filtering.py - Advanced metadata filtering techniques
  • external_embeddings.py - Using external embeddings (OpenAI, Cohere, etc.)

Documentation

Requirements

  • Python 3.8+
  • langchain-core >= 0.2.0
  • httpx >= 0.27.0
  • pydantic >= 2.0.0

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

Links

Development Status

This package is in Beta stage. APIs are stabilizing.

Current version: 0.4.0


Made by the Seahorse Team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_seahorse-0.4.1.tar.gz (265.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_seahorse-0.4.1-py3-none-any.whl (29.0 kB view details)

Uploaded Python 3

File details

Details for the file langchain_seahorse-0.4.1.tar.gz.

File metadata

  • Download URL: langchain_seahorse-0.4.1.tar.gz
  • Upload date:
  • Size: 265.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for langchain_seahorse-0.4.1.tar.gz
Algorithm Hash digest
SHA256 a571be0e836bfe1f2a074d6e7c0580d7c934782d69a414bd1197255b2843f669
MD5 8f6ac5ddea8a6f53380f7f9410f5a5b0
BLAKE2b-256 e769afe3bae87b81af87939f706d60925bd3645c4bf16e1a3d8dec798656e25e

See more details on using hashes here.

File details

Details for the file langchain_seahorse-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: langchain_seahorse-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 29.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for langchain_seahorse-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e794319fb6ccacdfccf00d47b4976d210229a5a85f2c999ff96b6c3848f19c1b
MD5 39a492d867f07565a6f1c2caa3464243
BLAKE2b-256 fd89f913d3af61234b8e505fb07a89af5e6727d2676fcfff8f24580bf5b128b9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page