Skip to main content

LangChain VectorStore integration for Seahorse API Gateway

Project description

LangChain Seahorse VectorStore

Python Version License

LangChain VectorStore integration for Seahorse API Gateway - A high-performance vector database for semantic search and RAG applications.

Features

  • LangChain Compatible: Full implementation of LangChain VectorStore interface
  • Schema-Aware Column Resolution: Dense and sparse vector columns are auto-resolved from GET /v2/data/schema
  • Hybrid Search: Dense, Sparse, and Hybrid (RRF) search modes
  • Dual Embedding Support: Use Seahorse's built-in embeddings or bring your own (OpenAI, Cohere, etc.)
  • Metadata Filtering: Filter search results by metadata
  • Batch Processing: Efficient handling of large datasets (auto-batched; max 50 rows/request, max 32KB text/row)
  • Type-Safe: Complete type hints for Python 3.8+
  • Well-Tested: Comprehensive unit and integration tests

Installation

# Using pip
pip install langchain-seahorse

# Using uv (recommended)
uv add langchain-seahorse

Quick Start

Basic Usage with Built-in Embeddings

from seahorse_vector_store import SeahorseVectorStore

# Initialize vectorstore
vectorstore = SeahorseVectorStore(
    api_key="your-seahorse-api-key",
    base_url="https://your-table-uuid.api.seahorse.dnotitia.ai",
)

# Add documents
ids = vectorstore.add_texts(
    texts=[
        "Machine learning is a subset of AI.",
        "Deep learning uses neural networks.",
    ],
    metadatas=[
        {"source": "doc1.pdf", "page": 1},
        {"source": "doc2.pdf", "page": 5},
    ]
)

# Search
docs = vectorstore.similarity_search(
    query="What is machine learning?",
    k=2
)

for doc in docs:
    print(doc.page_content)
    print(doc.metadata)

Using External Embeddings

from seahorse_vector_store import SeahorseVectorStore
from langchain_openai import OpenAIEmbeddings

vectorstore = SeahorseVectorStore(
    api_key="your-seahorse-api-key",
    base_url="https://your-table-uuid.api.seahorse.dnotitia.ai",
    embedding=OpenAIEmbeddings(api_key="your-openai-key"),
    use_builtin_embedding=False,
)

# Use as normal...

Hybrid Search (Dense + Sparse)

from seahorse_vector_store import SeahorseVectorStore, SearchMode

vectorstore = SeahorseVectorStore(
    api_key="your-api-key",
    base_url="https://your-table-uuid.api.seahorse.dnotitia.ai",
)

# Default: Hybrid search (Dense + Sparse with RRF fusion)
docs = vectorstore.similarity_search("machine learning", k=5)

# Pure Dense search
docs = vectorstore.similarity_search(
    "machine learning", k=5, retrieval_mode=SearchMode.DENSE
)

# Pure Sparse search (BM25-based)
docs = vectorstore.similarity_search(
    "machine learning", k=5, retrieval_mode=SearchMode.SPARSE
)

Metadata Filtering

# Search with metadata filter
docs = vectorstore.similarity_search(
    query="neural networks",
    k=5,
    filter={"source": "doc1.pdf", "page": 1}
)

🔧 Configuration

Environment Variables

You can set API credentials via environment variables:

export SEAHORSE_API_KEY="your-api-key"
export SEAHORSE_BASE_URL="https://your-table-uuid.api.seahorse.dnotitia.ai"

Then use them in your code:

import os
from seahorse_vector_store import SeahorseVectorStore

vectorstore = SeahorseVectorStore(
    api_key=os.environ["SEAHORSE_API_KEY"],
    base_url=os.environ["SEAHORSE_BASE_URL"],
)

Advanced Options

vectorstore = SeahorseVectorStore(
    api_key="your-api-key",
    base_url="https://your-table-uuid.api.seahorse.dnotitia.ai",
    use_builtin_embedding=True,  # Use Seahorse embeddings
    # dense_column / sparse_column are optional explicit overrides.
    # If omitted, the SDK resolves them from GET /v2/data/schema.
)

Primary Key Behavior

Seahorse uses mandatory content-hash primary keys.

  • add_texts() and from_texts() always return IDs generated from Seahorse PK rules.
  • Caller-provided custom IDs, including LangChain Document.id, are not persisted as the stored row ID.
  • Use the returned ids from insert operations as the source of truth for later delete workflows.

📖 API Reference

SeahorseVectorStore

Main class for interacting with Seahorse as a vector store.

Synchronous Methods

  • add_texts(texts, metadatas=None, **kwargs) - Add texts to the vector store
  • similarity_search(query, k=4, filter=None, **kwargs) - Search for similar documents
  • similarity_search_with_score(query, k=4, filter=None, **kwargs) - Search with distance scores
  • similarity_search_by_vector(embedding, k=4, filter=None, **kwargs) - Search by vector
  • similarity_search_by_vector_with_score(embedding, k=4, filter=None, **kwargs) - Search by vector with scores
  • delete(ids=None, **kwargs) - Delete documents by IDs
  • from_texts(texts, embedding=None, metadatas=None, **kwargs) - Create vectorstore from texts

Async Methods

  • aadd_texts(texts, metadatas=None, **kwargs) - Add texts asynchronously
  • asimilarity_search(query, k=4, filter=None, **kwargs) - Search asynchronously
  • asimilarity_search_with_score(query, k=4, filter=None, **kwargs) - Search with scores asynchronously
  • asimilarity_search_by_vector(embedding, k=4, filter=None, **kwargs) - Search by vector asynchronously
  • asimilarity_search_by_vector_with_score(embedding, k=4, filter=None, **kwargs) - Search by vector with scores asynchronously
  • adelete(ids=None, **kwargs) - Delete documents asynchronously

Search Modes

  • SearchMode.HYBRID (default) - Dense + Sparse with RRF fusion
  • SearchMode.DENSE - Pure dense vector search
  • SearchMode.SPARSE - Pure sparse (BM25) search

Not Supported

  • max_marginal_relevance_search() - ⚠️ MMR search is not supported by Seahorse API

Testing

Setup for Integration Tests

Create a .env file in the project root with your Seahorse credentials:

# Copy the example file
cp .env.example .env

# Edit .env and add your credentials
SEAHORSE_API_KEY=your-api-key
SEAHORSE_BASE_URL=https://your-table-uuid.api.seahorse.dnotitia.ai

Running Tests

# Run unit tests
uv run pytest tests/unit/

# Run basic integration tests (requires .env file with API credentials)
uv run pytest tests/integration/ \
  --ignore=tests/integration/test_ollama_embeddings.py \
  --ignore=tests/integration/test_rag_pipeline.py

# Run all tests with coverage
uv run pytest --cov=seahorse_vector_store --cov-report=term-missing

# Skip integration tests
uv run pytest -m "not integration"

Running Ollama Integration Tests (Optional)

For advanced tests using Ollama LLM and embeddings:

# 1. Install Ollama dependencies (Python 3.9+ required)
uv pip install langchain langchain-ollama

# 2. Start Ollama server
ollama serve

# 3. Download models
ollama pull qwen3-embedding:8b  # For embeddings
ollama pull qwen3:8b             # For RAG

# 4. Run Ollama tests
uv run pytest tests/integration/test_ollama_embeddings.py -v
uv run pytest tests/integration/test_rag_pipeline.py -v

# 5. Run all integration tests (including Ollama)
uv run pytest tests/integration/ -v

Note: Ollama tests will automatically skip if Ollama is not available or required models are not installed.

Examples

See the examples/ directory for complete examples:

  • basic_usage.py - Basic vectorstore operations
  • async_usage.py - Async/await operations for better performance
  • rag_pipeline.py - Building a RAG (Retrieval-Augmented Generation) pipeline
  • metadata_filtering.py - Advanced metadata filtering techniques
  • external_embeddings.py - Using external embeddings (OpenAI, Cohere, etc.)

Documentation

Requirements

  • Python 3.8+
  • langchain-core >= 0.2.0
  • httpx >= 0.27.0
  • pydantic >= 2.0.0

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

Links

Development Status

This package is in Beta stage. APIs are stabilizing.

Current version: 0.3.0


Made by the Seahorse Team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_seahorse-0.3.0.tar.gz (327.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_seahorse-0.3.0-py3-none-any.whl (25.0 kB view details)

Uploaded Python 3

File details

Details for the file langchain_seahorse-0.3.0.tar.gz.

File metadata

  • Download URL: langchain_seahorse-0.3.0.tar.gz
  • Upload date:
  • Size: 327.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for langchain_seahorse-0.3.0.tar.gz
Algorithm Hash digest
SHA256 e616a2bea52548a8a3e4cc8190a27dcc0d46451eeead229062ee02cfc2c1aee7
MD5 255a7d2f8ec1ca158be1482e15cd9459
BLAKE2b-256 29b9024ba68dfe2abdd57e479285e387963ff5695f7363ce1d1437b96e01c66c

See more details on using hashes here.

File details

Details for the file langchain_seahorse-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: langchain_seahorse-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 25.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for langchain_seahorse-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 78091a2dfbeba330f9671176c5de1a576e3dc51afd9cc49136e3033de5cf4622
MD5 758061e3b3280d8af4dda09f3a6b944d
BLAKE2b-256 91f130c67032c0d6f697d9786d75ba90e15a5a0a564a347ddb86742059168a91

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page