Python client library for SeahorseDB via Coral API server
Project description
seahorse-coral
gRPC-native Python client library for SeahorseDB via Coral.
The low-level Python surface is intentionally:
- Arrow-first for tabular reads (
scan,search,hybrid) - typed-model-first for metadata/admin results (
nodes, segment status/retry, table schema) - explicit about post-processing (
to_pyarrow(),to_pandas(),to_polars(),to_json())
Additional docs:
- Build from source:
docs/build.md - Advanced usage:
docs/advanced.md - Compatibility policy:
docs/compatibility.md
Quickstart (Recommended)
This project intentionally supports multiple ways to define a schema (preset / components / builder). To keep onboarding simple, we recommend starting with the preset schema and only moving to Advanced when you need customization.
1) Create a Coral client
import seahorse_coral as sc
coral = sc.Coral("http://localhost:8080")
Coral and AsyncCoral use the same gRPC-native contract as the Rust client.
2) Create a table (preset: id + vector + metadata)
import seahorse_coral as sc
schema = sc.default_vector_table_schema(
dim=384,
# Optional:
# id_type=sc.ScalarType.STRING,
)
table = coral.create_table("documents", schema=schema)
Metadata/admin APIs also return typed Python models:
nodes = coral.nodes()
first = nodes[0]
print(first.node_id)
print(first.node_address)
Table admin/mutation helpers are also available:
counts = table.indexed_row_count() # readable counts included by default
print(counts.total_row_count)
print(counts.readable.total_row_count if counts.readable else None)
table.update_rows("metadata = '{\"source\":\"updated\"}'", where="id = 1")
table.delete_rows(where="id = 2")
plan = table.rebalance_plan(writer_nodes=["writer-2"])
status = table.rebalance_status()
commit = table.rebalance_commit(plan.commit_template)
print(commit.status)
The preset creates:
id:INT64primary key (orSTRINGifid_type=ScalarType.STRING)vector: dense vector columnmetadata:STRING(nullable; store JSON-encoded strings if you want structured metadata)
(Optional) Schema building (SchemaBuilder)
If you need customization (more columns, segmentation, multiple indexes, etc.), build a schema explicitly.
A) Create table with components (no SchemaBuilder object)
import seahorse_coral as sc
table = coral.create_table(
"documents",
columns=[
sc.int64_column("id", nullable=False),
sc.vector_column("vector", dim=384),
sc.metadata_column("metadata"),
],
primary_key=["id"],
indexes=[sc.hnsw_index("vector")], # List[IndexDefinition]
)
B) SchemaBuilder (constructor style)
import seahorse_coral as sc
schema = sc.SchemaBuilder(
columns=[
sc.int64_column("id", nullable=False),
sc.vector_column("vector", dim=384),
sc.metadata_column("metadata"),
],
primary_key=["id"],
indexes=[sc.hnsw_index("vector", space=sc.IndexSpace.COSINE)],
)
table = coral.create_table("documents", schema=schema)
C) SchemaBuilder (fluent / chain style)
import seahorse_coral as sc
schema = (
sc.SchemaBuilder()
.int64("id", nullable=False)
.vector("vector", dim=384)
.metadata()
.with_primary_key("id")
.hnsw("vector", space=sc.IndexSpace.COSINE)
)
table = coral.create_table("documents", schema=schema)
3) Insert rows
import json
table.insert_rows(
[
{"id": 1, "vector": [0.1, 0.2, 0.3], "metadata": json.dumps({"source": "a"})},
{"id": 2, "vector": [0.2, 0.1, 0.0], "metadata": json.dumps({"source": "b"})},
]
)
(Optional) More insert options
Write APIs are explicit by mode.
import seahorse_coral as sc
# 1) JSONL string (each line is a JSON object)
jsonl = (
'{"id": 4, "vector": [0.4, 0.4, 0.4], "metadata": "{}"}\n'
'{"id": 5, "vector": [0.5, 0.5, 0.5], "metadata": "{}"}\n'
)
table.insert_jsonl(jsonl)
# 2) Local Parquet file
# - client converts Parquet -> Arrow IPC stream -> gRPC upload stream
table.insert_parquet("./data/documents.parquet", batch_size=8192)
# 3) Single remote Parquet file
# - server reads the object directly
table.insert_parquet(
sc.s3_file(
"path/to/documents.parquet",
bucket="my-bucket",
access_key="YOUR_ACCESS_KEY",
secret_key="YOUR_SECRET_KEY",
region="ap-northeast-2",
),
options=sc.ImportOptions(reader_batch_size=8192),
)
# 4) Multi-file import from S3
request = sc.s3_file(
["path/to/a.parquet", "path/to/b.parquet"],
bucket="my-bucket",
access_key="YOUR_ACCESS_KEY",
secret_key="YOUR_SECRET_KEY",
region="ap-northeast-2",
)
options = sc.ImportOptions(
format=sc.FileFormat.PARQUET,
reader_batch_size=8192, # reader record batch size
max_concurrent_files=4, # optional
)
result = table.import_files(request, options=options)
print(result.total_inserted_row_count)
# `import_files()` is strict by default.
# If any file fails, sc.PartialImportError or sc.ImportFilesError is raised
# and the exception carries the same ImportFilesResult via `.result`.
# 5) Arrow IPC stream bytes (advanced)
# - bytes, pyarrow.Table, pyarrow.RecordBatch, and list[RecordBatch] are supported
table.insert_arrow(arrow_ipc_bytes)
4) Search (dense)
# Dense vector search
#
# Note:
# - `index` is the index name, typically the same as the vector column name.
vec = table.index("vector")
result = vec.search([0.1, 0.2, 0.3], top_k=10)
result = vec.search(
[0.1, 0.2, 0.3],
top_k=10,
ef_search=128,
select="id, metadata, distance",
where="id > 0",
)
5) Consume Arrow-first tabular results
scan() and search() return ResultSet.
result = table.scan(select="id, metadata", limit=100)
# Low-level Arrow-native access
batches = result.to_record_batches()
arrow_table = result.to_pyarrow()
# Explicit convenience conversions
rows = result.to_json()
df = result.to_pandas()
For batch vector search, use ResultSets.
results = vec.search_batch([[0.1, 0.2, 0.3], [0.3, 0.2, 0.1]], top_k=10)
for result in results:
print(result.to_pyarrow())
Large-result paths are exposed separately.
for batch in table.scan_stream(select="id, metadata"):
process(batch)
for result in vec.search_batch_stream([[0.1, 0.2, 0.3], [0.3, 0.2, 0.1]], top_k=10):
process(result.to_pyarrow())
6) Bootstrap schema from a parquet file
Use schema_from_parquet() when you want to start from an existing parquet layout and then
adjust the schema before table creation. A plain string path is read from the client machine, and
remote/object-store sources should be passed as FileSource.
import seahorse_coral as sc
schema = coral.schema_from_parquet("./data/documents.parquet")
schema.with_primary_key("id")
table = coral.create_table("documents_from_parquet", schema=schema)
7) Export or download parquet
export_parquet() writes files on the Coral server side. download_parquet() and
download_parquet_stream() bring the result back to the client process, and local disk writes stay
explicit via write_to() or download_parquet_to().
import seahorse_coral as sc
result = table.export_parquet(
sc.local_directory("/var/lib/coral/exports/documents"),
where="id > 100",
mode="single_file",
)
print(result.files)
downloaded = table.download_parquet(limit=1000)
print(downloaded.filename)
downloaded.write_to("./documents-sample.parquet")
table.download_parquet_to("./documents-full.parquet", where="id > 100")
(Optional) Sparse & hybrid search
Sparse/hybrid search requires a table that has a sparse vector column + inverted index.
import seahorse_coral as sc
schema = sc.SchemaBuilder(
columns=[
sc.int64_column("id", nullable=False),
sc.vector_column("vector", dim=384),
sc.sparse_vector_column("sparse_emb"),
sc.metadata_column("metadata"),
],
primary_key=["id"],
indexes=[
sc.hnsw_index("vector"),
sc.inverted_index("sparse_emb"),
],
)
table = coral.create_table("documents_hybrid", schema=schema)
# Sparse vector search (BM25 / inverted index)
sparse_query = "1:0.8 5:0.6 12:0.4"
result = table.index("sparse_emb").search_sparse(
sparse_query,
top_k=10,
bm25_k=1.2,
bm25_b=0.75,
)
# Hybrid search (dense + sparse + fusion)
# - requires dense_column + sparse_column
result = table.hybrid_search(
dense_column="vector",
dense_query=[0.1, 0.2, 0.3],
sparse_column="sparse_emb",
sparse_query=sparse_query,
top_k=10,
options=sc.HybridSearchOptions(
fusion="rrf",
rrf_k=60,
alpha=0.7,
),
)
Next steps
- Advanced schema options (segmentation / placement / tuning): docs/advanced.md
- Build from source / development: docs/build.md
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file seahorse_coral-0.2.0.tar.gz.
File metadata
- Download URL: seahorse_coral-0.2.0.tar.gz
- Upload date:
- Size: 150.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1aa4d282b04fc92d15bfdf4ac6e0e1afc32cbfa98262a1b2a380f0544373bb31
|
|
| MD5 |
dea9d8b9f69b8c67fa6278afb4367c67
|
|
| BLAKE2b-256 |
04e8f8bb2c9dab91924b604fb860d19c76e877c43b79b00bc4065f17d8d364e1
|
File details
Details for the file seahorse_coral-0.2.0-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: seahorse_coral-0.2.0-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 6.1 MB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
450dba55d1cb098d0de32ab4bd7a93ac5d3459b76efac2f13cd2f7f1fc24c2d3
|
|
| MD5 |
d24a4cfae9db5c85e160a2a317a75d47
|
|
| BLAKE2b-256 |
30e96b072eb178b52cf33e0448125c3e5ae25b3d720b2fb213541e7f2aa855ae
|
File details
Details for the file seahorse_coral-0.2.0-cp39-abi3-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: seahorse_coral-0.2.0-cp39-abi3-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 7.0 MB
- Tags: CPython 3.9+, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cdcff000a7a5eb3a7b4dfd2d0860a55d65c7450ca8b32eaa6555122eca5d81bf
|
|
| MD5 |
ac4311059c23afe6e3424b48766a689d
|
|
| BLAKE2b-256 |
68f7527dc7a5029ef7bb709c55d5372c046126d15b9af5b42b71c5ceb2a01e63
|
File details
Details for the file seahorse_coral-0.2.0-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: seahorse_coral-0.2.0-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 6.3 MB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c76c7198e31fac3f43cca57bcb423414fda18da588f590fea45d293a5a94e6c0
|
|
| MD5 |
6e6043093df5cb893e73e38c0e48dbe3
|
|
| BLAKE2b-256 |
3619c3911b65cfecfb70912e2ce22161be6193dedbad981b014bdfe0a5e60efb
|