LangChain VectorStore integration for Seahorse API Gateway
Project description
LangChain Seahorse VectorStore
LangChain VectorStore integration for Seahorse API Gateway - A high-performance vector database for semantic search and RAG applications.
Features
- LangChain Compatible: Full implementation of LangChain VectorStore interface
- Schema-Aware Column Resolution: Dense and sparse vector columns are auto-resolved from
GET /v2/data/schema - Hybrid Search: Dense, Sparse, and Hybrid (RRF) search modes
- Dual Embedding Support: Use Seahorse's built-in embeddings or bring your own (OpenAI, Cohere, etc.)
- Metadata Filtering: Filter search results by metadata
- Batch Processing: Efficient handling of large datasets (auto-batched; max 50 rows/request, max 32KB text/row)
- Indexing & Health Monitoring:
get_indexed_row_count()returns a typedIndexedRowCountmodel for tracking index build progress;health()provides a drop-in liveness probe - Type-Safe: Complete type hints for Python 3.8+
- Well-Tested: Comprehensive unit and integration tests
Installation
# Using pip
pip install langchain-seahorse
# Using uv (recommended)
uv add langchain-seahorse
Quick Start
Basic Usage with Built-in Embeddings
from seahorse_vector_store import SeahorseVectorStore
# Initialize vectorstore
vectorstore = SeahorseVectorStore(
api_key="your-seahorse-api-key",
base_url="https://your-table-uuid.api.seahorse.dnotitia.ai",
)
# Add documents
ids = vectorstore.add_texts(
texts=[
"Machine learning is a subset of AI.",
"Deep learning uses neural networks.",
],
metadatas=[
{"source": "doc1.pdf", "page": 1},
{"source": "doc2.pdf", "page": 5},
]
)
# Search
docs = vectorstore.similarity_search(
query="What is machine learning?",
k=2
)
for doc in docs:
print(doc.page_content)
print(doc.metadata)
Using External Embeddings
from seahorse_vector_store import SeahorseVectorStore
from langchain_openai import OpenAIEmbeddings
vectorstore = SeahorseVectorStore(
api_key="your-seahorse-api-key",
base_url="https://your-table-uuid.api.seahorse.dnotitia.ai",
embedding=OpenAIEmbeddings(api_key="your-openai-key"),
use_builtin_embedding=False,
)
# Use as normal...
Hybrid Search (Dense + Sparse)
from seahorse_vector_store import SeahorseVectorStore, SearchMode
vectorstore = SeahorseVectorStore(
api_key="your-api-key",
base_url="https://your-table-uuid.api.seahorse.dnotitia.ai",
)
# Default: Hybrid search (Dense + Sparse with RRF fusion)
docs = vectorstore.similarity_search("machine learning", k=5)
# Pure Dense search
docs = vectorstore.similarity_search(
"machine learning", k=5, retrieval_mode=SearchMode.DENSE
)
# Pure Sparse search (BM25-based)
docs = vectorstore.similarity_search(
"machine learning", k=5, retrieval_mode=SearchMode.SPARSE
)
Metadata Filtering
# Search with metadata filter
docs = vectorstore.similarity_search(
query="neural networks",
k=5,
filter={"source": "doc1.pdf", "page": 1}
)
Indexing Status & Health Check
# Per-index indexing progress (typed model). Top-level counts are
# writer-based; ``stats.readable`` adds a reader-node view (segment dedup
# + ``row_count - deleted_row_count`` saturating).
stats = vectorstore.get_indexed_row_count()
print(stats.total_row_count)
for idx in stats.indexed_counts:
print(f"{idx.index_name} ({idx.index_type}): {idx.indexed_row_count}")
# Skip the reader-node ``readable`` view when only writer counts are needed
stats = vectorstore.get_indexed_row_count(readable=False)
# Lightweight liveness probe — True on 200 OK, False on any SeahorseAPIError
if not vectorstore.health():
raise RuntimeError("Seahorse backend is unreachable")
🔧 Configuration
Environment Variables
You can set API credentials via environment variables:
export SEAHORSE_API_KEY="your-api-key"
export SEAHORSE_BASE_URL="https://your-table-uuid.api.seahorse.dnotitia.ai"
Then use them in your code:
import os
from seahorse_vector_store import SeahorseVectorStore
vectorstore = SeahorseVectorStore(
api_key=os.environ["SEAHORSE_API_KEY"],
base_url=os.environ["SEAHORSE_BASE_URL"],
)
Advanced Options
vectorstore = SeahorseVectorStore(
api_key="your-api-key",
base_url="https://your-table-uuid.api.seahorse.dnotitia.ai",
use_builtin_embedding=True, # Use Seahorse embeddings
# dense_column / sparse_column are optional explicit overrides.
# If omitted, the SDK resolves them from GET /v2/data/schema.
)
Primary Key Behavior
Seahorse uses mandatory content-hash primary keys.
add_texts()andfrom_texts()always return IDs generated from Seahorse PK rules.- Caller-provided custom IDs, including LangChain
Document.id, are not persisted as the stored row ID. - Use the returned
idsfrom insert operations as the source of truth for later delete workflows.
📖 API Reference
SeahorseVectorStore
Main class for interacting with Seahorse as a vector store.
Synchronous Methods
add_texts(texts, metadatas=None, **kwargs)- Add texts to the vector storesimilarity_search(query, k=4, filter=None, **kwargs)- Search for similar documentssimilarity_search_with_score(query, k=4, filter=None, **kwargs)- Search with distance scoressimilarity_search_by_vector(embedding, k=4, filter=None, **kwargs)- Search by vectorsimilarity_search_by_vector_with_score(embedding, k=4, filter=None, **kwargs)- Search by vector with scoresdelete(ids=None, **kwargs)- Delete documents by IDsfrom_texts(texts, embedding=None, metadatas=None, **kwargs)- Create vectorstore from textsget_indexed_row_count(readable=True)- Per-index indexed row counts asIndexedRowCounthealth()- Lightweight liveness probe (returnsbool)
Async Methods
aadd_texts(texts, metadatas=None, **kwargs)- Add texts asynchronouslyasimilarity_search(query, k=4, filter=None, **kwargs)- Search asynchronouslyasimilarity_search_with_score(query, k=4, filter=None, **kwargs)- Search with scores asynchronouslyasimilarity_search_by_vector(embedding, k=4, filter=None, **kwargs)- Search by vector asynchronouslyasimilarity_search_by_vector_with_score(embedding, k=4, filter=None, **kwargs)- Search by vector with scores asynchronouslyadelete(ids=None, **kwargs)- Delete documents asynchronouslyaget_indexed_row_count(readable=True)- Per-index indexed row counts (async)ahealth()- Async liveness probe
Search Modes
SearchMode.HYBRID(default) - Dense + Sparse with RRF fusionSearchMode.DENSE- Pure dense vector searchSearchMode.SPARSE- Pure sparse (BM25) search
Not Supported
max_marginal_relevance_search()- ⚠️ MMR search is not supported by Seahorse API
Testing
Setup for Integration Tests
Create a .env file in the project root with your Seahorse credentials:
# Copy the example file
cp .env.example .env
# Edit .env and add your credentials
SEAHORSE_API_KEY=your-api-key
SEAHORSE_BASE_URL=https://your-table-uuid.api.seahorse.dnotitia.ai
Running Tests
# Run unit tests
uv run pytest tests/unit/
# Run basic integration tests (requires .env file with API credentials)
uv run pytest tests/integration/ \
--ignore=tests/integration/test_ollama_embeddings.py \
--ignore=tests/integration/test_rag_pipeline.py
# Run all tests with coverage
uv run pytest --cov=seahorse_vector_store --cov-report=term-missing
# Skip integration tests
uv run pytest -m "not integration"
Running Ollama Integration Tests (Optional)
For advanced tests using Ollama LLM and embeddings:
# 1. Install Ollama dependencies (Python 3.9+ required)
uv pip install langchain langchain-ollama
# 2. Start Ollama server
ollama serve
# 3. Download models
ollama pull qwen3-embedding:8b # For embeddings
ollama pull qwen3:8b # For RAG
# 4. Run Ollama tests
uv run pytest tests/integration/test_ollama_embeddings.py -v
uv run pytest tests/integration/test_rag_pipeline.py -v
# 5. Run all integration tests (including Ollama)
uv run pytest tests/integration/ -v
Note: Ollama tests will automatically skip if Ollama is not available or required models are not installed.
Examples
See the examples/ directory for complete examples:
basic_usage.py- Basic vectorstore operationsasync_usage.py- Async/await operations for better performancerag_pipeline.py- Building a RAG (Retrieval-Augmented Generation) pipelinemetadata_filtering.py- Advanced metadata filtering techniquesexternal_embeddings.py- Using external embeddings (OpenAI, Cohere, etc.)
Documentation
- API Reference - Complete API documentation
- Tutorial
Requirements
- Python 3.8+
- langchain-core >= 0.2.0
- httpx >= 0.27.0
- pydantic >= 2.0.0
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Support
- Console: Seahorse Console
Links
Development Status
This package is in Beta stage. APIs are stabilizing.
Current version: 0.4.0
Made by the Seahorse Team
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_seahorse-0.4.1.tar.gz.
File metadata
- Download URL: langchain_seahorse-0.4.1.tar.gz
- Upload date:
- Size: 265.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a571be0e836bfe1f2a074d6e7c0580d7c934782d69a414bd1197255b2843f669
|
|
| MD5 |
8f6ac5ddea8a6f53380f7f9410f5a5b0
|
|
| BLAKE2b-256 |
e769afe3bae87b81af87939f706d60925bd3645c4bf16e1a3d8dec798656e25e
|
File details
Details for the file langchain_seahorse-0.4.1-py3-none-any.whl.
File metadata
- Download URL: langchain_seahorse-0.4.1-py3-none-any.whl
- Upload date:
- Size: 29.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e794319fb6ccacdfccf00d47b4976d210229a5a85f2c999ff96b6c3848f19c1b
|
|
| MD5 |
39a492d867f07565a6f1c2caa3464243
|
|
| BLAKE2b-256 |
fd89f913d3af61234b8e505fb07a89af5e6727d2676fcfff8f24580bf5b128b9
|