Skip to main content

Python client library for SeahorseDB via Coral API server

Project description

seahorse-coral

gRPC-native Python client library for SeahorseDB via Coral.

The low-level Python surface is intentionally:

  • Arrow-first for tabular reads (scan, search, hybrid)
  • typed-model-first for metadata/admin results (nodes, segment status/retry, table schema)
  • explicit about post-processing (to_pyarrow(), to_pandas(), to_polars(), to_json())

Additional docs:

  • Build from source: docs/build.md
  • Advanced usage: docs/advanced.md
  • Compatibility policy: docs/compatibility.md

Quickstart (Recommended)

This project intentionally supports multiple ways to define a schema (preset / components / builder). To keep onboarding simple, we recommend starting with the preset schema and only moving to Advanced when you need customization.

1) Create a Coral client

import seahorse_coral as sc

coral = sc.Coral("http://localhost:8080")

Coral and AsyncCoral use the same gRPC-native contract as the Rust client.

2) Create a table (preset: id + vector + metadata)

import seahorse_coral as sc

schema = sc.default_vector_table_schema(
    dim=384,
    # Optional:
    # id_type=sc.ScalarType.STRING,
)

table = coral.create_table("documents", schema=schema)

Metadata/admin APIs also return typed Python models:

nodes = coral.nodes()
first = nodes[0]
print(first.node_id)
print(first.node_address)

Table admin/mutation helpers are also available:

counts = table.indexed_row_count()  # readable counts included by default
print(counts.total_row_count)
print(counts.readable.total_row_count if counts.readable else None)

table.update_rows("metadata = '{\"source\":\"updated\"}'", where="id = 1")
table.delete_rows(where="id = 2")

plan = table.rebalance_plan(writer_nodes=["writer-2"])
status = table.rebalance_status()
commit = table.rebalance_commit(plan.commit_template)
print(commit.status)

The preset creates:

  • id: INT64 primary key (or STRING if id_type=ScalarType.STRING)
  • vector: dense vector column
  • metadata: STRING (nullable; store JSON-encoded strings if you want structured metadata)

(Optional) Schema building (SchemaBuilder)

If you need customization (more columns, segmentation, multiple indexes, etc.), build a schema explicitly.

A) Create table with components (no SchemaBuilder object)

import seahorse_coral as sc

table = coral.create_table(
    "documents",
    columns=[
        sc.int64_column("id", nullable=False),
        sc.vector_column("vector", dim=384),
        sc.metadata_column("metadata"),
    ],
    primary_key=["id"],
    indexes=[sc.hnsw_index("vector")],  # List[IndexDefinition]
)

B) SchemaBuilder (constructor style)

import seahorse_coral as sc

schema = sc.SchemaBuilder(
    columns=[
        sc.int64_column("id", nullable=False),
        sc.vector_column("vector", dim=384),
        sc.metadata_column("metadata"),
    ],
    primary_key=["id"],
    indexes=[sc.hnsw_index("vector", space=sc.IndexSpace.COSINE)],
)

table = coral.create_table("documents", schema=schema)

C) SchemaBuilder (fluent / chain style)

import seahorse_coral as sc

schema = (
    sc.SchemaBuilder()
    .int64("id", nullable=False)
    .vector("vector", dim=384)
    .metadata()
    .with_primary_key("id")
    .hnsw("vector", space=sc.IndexSpace.COSINE)
)

table = coral.create_table("documents", schema=schema)

3) Insert rows

import json

table.insert_rows(
    [
        {"id": 1, "vector": [0.1, 0.2, 0.3], "metadata": json.dumps({"source": "a"})},
        {"id": 2, "vector": [0.2, 0.1, 0.0], "metadata": json.dumps({"source": "b"})},
    ]
)

(Optional) More insert options

Write APIs are explicit by mode.

import seahorse_coral as sc

# 1) JSONL string (each line is a JSON object)
jsonl = (
    '{"id": 4, "vector": [0.4, 0.4, 0.4], "metadata": "{}"}\n'
    '{"id": 5, "vector": [0.5, 0.5, 0.5], "metadata": "{}"}\n'
)
table.insert_jsonl(jsonl)

# 2) Local Parquet file
# - client converts Parquet -> Arrow IPC stream -> gRPC upload stream
table.insert_parquet("./data/documents.parquet", batch_size=8192)

# 3) Single remote Parquet file
# - server reads the object directly
table.insert_parquet(
    sc.s3_file(
        "path/to/documents.parquet",
        bucket="my-bucket",
        access_key="YOUR_ACCESS_KEY",
        secret_key="YOUR_SECRET_KEY",
        region="ap-northeast-2",
    ),
    options=sc.ImportOptions(reader_batch_size=8192),
)

# 4) Multi-file import from S3
request = sc.s3_file(
    ["path/to/a.parquet", "path/to/b.parquet"],
    bucket="my-bucket",
    access_key="YOUR_ACCESS_KEY",
    secret_key="YOUR_SECRET_KEY",
    region="ap-northeast-2",
)
options = sc.ImportOptions(
    format=sc.FileFormat.PARQUET,
    reader_batch_size=8192,  # reader record batch size
    max_concurrent_files=4,  # optional
)
result = table.import_files(request, options=options)
print(result.total_inserted_row_count)

# `import_files()` is strict by default.
# If any file fails, sc.PartialImportError or sc.ImportFilesError is raised
# and the exception carries the same ImportFilesResult via `.result`.

# 5) Arrow IPC stream bytes (advanced)
# - bytes, pyarrow.Table, pyarrow.RecordBatch, and list[RecordBatch] are supported
table.insert_arrow(arrow_ipc_bytes)

4) Search (dense)

# Dense vector search
#
# Note:
# - `index` is the index name, typically the same as the vector column name.
vec = table.index("vector")

result = vec.search([0.1, 0.2, 0.3], top_k=10)

result = vec.search(
    [0.1, 0.2, 0.3],
    top_k=10,
    ef_search=128,
    select="id, metadata, distance",
    where="id > 0",
)

5) Consume Arrow-first tabular results

scan() and search() return ResultSet.

result = table.scan(select="id, metadata", limit=100)

# Low-level Arrow-native access
batches = result.to_record_batches()
arrow_table = result.to_pyarrow()

# Explicit convenience conversions
rows = result.to_json()
df = result.to_pandas()

For batch vector search, use ResultSets.

results = vec.search_batch([[0.1, 0.2, 0.3], [0.3, 0.2, 0.1]], top_k=10)

for result in results:
    print(result.to_pyarrow())

Large-result paths are exposed separately.

for batch in table.scan_stream(select="id, metadata"):
    process(batch)

for result in vec.search_batch_stream([[0.1, 0.2, 0.3], [0.3, 0.2, 0.1]], top_k=10):
    process(result.to_pyarrow())

6) Bootstrap schema from a parquet file

Use schema_from_parquet() when you want to start from an existing parquet layout and then adjust the schema before table creation. A plain string path is read from the client machine, and remote/object-store sources should be passed as FileSource.

import seahorse_coral as sc

schema = coral.schema_from_parquet("./data/documents.parquet")
schema.with_primary_key("id")

table = coral.create_table("documents_from_parquet", schema=schema)

7) Export or download parquet

export_parquet() writes files on the Coral server side. download_parquet() and download_parquet_stream() bring the result back to the client process, and local disk writes stay explicit via write_to() or download_parquet_to().

import seahorse_coral as sc

result = table.export_parquet(
    sc.local_directory("/var/lib/coral/exports/documents"),
    where="id > 100",
    mode="single_file",
)
print(result.files)

downloaded = table.download_parquet(limit=1000)
print(downloaded.filename)
downloaded.write_to("./documents-sample.parquet")

table.download_parquet_to("./documents-full.parquet", where="id > 100")

(Optional) Sparse & hybrid search

Sparse/hybrid search requires a table that has a sparse vector column + inverted index.

import seahorse_coral as sc

schema = sc.SchemaBuilder(
    columns=[
        sc.int64_column("id", nullable=False),
        sc.vector_column("vector", dim=384),
        sc.sparse_vector_column("sparse_emb"),
        sc.metadata_column("metadata"),
    ],
    primary_key=["id"],
    indexes=[
        sc.hnsw_index("vector"),
        sc.inverted_index("sparse_emb"),
    ],
)

table = coral.create_table("documents_hybrid", schema=schema)

# Sparse vector search (BM25 / inverted index)
sparse_query = "1:0.8 5:0.6 12:0.4"
result = table.index("sparse_emb").search_sparse(
    sparse_query,
    top_k=10,
    bm25_k=1.2,
    bm25_b=0.75,
)

# Hybrid search (dense + sparse + fusion)
# - requires dense_column + sparse_column
result = table.hybrid_search(
    dense_column="vector",
    dense_query=[0.1, 0.2, 0.3],
    sparse_column="sparse_emb",
    sparse_query=sparse_query,
    top_k=10,
    options=sc.HybridSearchOptions(
        fusion="rrf",
        rrf_k=60,
        alpha=0.7,
    ),
)

Next steps

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seahorse_coral-0.2.0.tar.gz (150.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

seahorse_coral-0.2.0-cp39-abi3-win_amd64.whl (6.1 MB view details)

Uploaded CPython 3.9+Windows x86-64

seahorse_coral-0.2.0-cp39-abi3-manylinux_2_34_x86_64.whl (7.0 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.34+ x86-64

seahorse_coral-0.2.0-cp39-abi3-macosx_11_0_arm64.whl (6.3 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file seahorse_coral-0.2.0.tar.gz.

File metadata

  • Download URL: seahorse_coral-0.2.0.tar.gz
  • Upload date:
  • Size: 150.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for seahorse_coral-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1aa4d282b04fc92d15bfdf4ac6e0e1afc32cbfa98262a1b2a380f0544373bb31
MD5 dea9d8b9f69b8c67fa6278afb4367c67
BLAKE2b-256 04e8f8bb2c9dab91924b604fb860d19c76e877c43b79b00bc4065f17d8d364e1

See more details on using hashes here.

File details

Details for the file seahorse_coral-0.2.0-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for seahorse_coral-0.2.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 450dba55d1cb098d0de32ab4bd7a93ac5d3459b76efac2f13cd2f7f1fc24c2d3
MD5 d24a4cfae9db5c85e160a2a317a75d47
BLAKE2b-256 30e96b072eb178b52cf33e0448125c3e5ae25b3d720b2fb213541e7f2aa855ae

See more details on using hashes here.

File details

Details for the file seahorse_coral-0.2.0-cp39-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for seahorse_coral-0.2.0-cp39-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 cdcff000a7a5eb3a7b4dfd2d0860a55d65c7450ca8b32eaa6555122eca5d81bf
MD5 ac4311059c23afe6e3424b48766a689d
BLAKE2b-256 68f7527dc7a5029ef7bb709c55d5372c046126d15b9af5b42b71c5ceb2a01e63

See more details on using hashes here.

File details

Details for the file seahorse_coral-0.2.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for seahorse_coral-0.2.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c76c7198e31fac3f43cca57bcb423414fda18da588f590fea45d293a5a94e6c0
MD5 6e6043093df5cb893e73e38c0e48dbe3
BLAKE2b-256 3619c3911b65cfecfb70912e2ce22161be6193dedbad981b014bdfe0a5e60efb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page