Python SDK for the Dasein managed vector index service

These details have not been verified by PyPI

Project links

Project description

Dasein

Multi-hop retrieval in ~1 second. Higher recall. 68% fewer tokens to your LLM. Agentic Search · Dynamic Top-K · Dynamic Hybrid · Hybrid dense + BM25 · Managed embedding

Live Demo • Quick Start • Agentic Search • Dynamic Top-K • Dynamic Hybrid • Hybrid Search • Query Batch • API Reference • Benchmarks • Dynamic Hybrid Results

The managed vector index that does the hard parts of retrieval for you. Agentic decomposition, per-query α fusion, and Dynamic Top-K — all on a single index.query() call. Built for teams tired of paying their LLM provider to read ten mediocre chunks when two would have done it.

Higher recall. Dynamic Hybrid picks the dense/BM25 α per query instead of one static α tuned to your average query. Up to +23pt R@10 over fixed-α hybrid. Across any encoder, no retraining.

~68% fewer tokens to your LLM. Dynamic Top-K trims the result set per query — 1–3 results on easy queries, the full budget on hard ones, recall held flat or higher. Your top_k stays a hard ceiling, Dasein only ever clips down. Pairs with Agentic Search for compounding savings on every hop.

In ~1 second. Agentic Search runs 3–5 retrieval hops with intermediate reasoning server-side, then returns the final-hop ranking — in roughly the time most RAG stacks take for a single-hop top-10.

Smaller, faster index. 12× smaller than fp32, 10× faster queries. The compression is the speedup — smaller footprint keeps more of your index hot. Benchmarks →

Install

pip install dasein-ai  # not "dasein" — the package name is dasein-ai

Quick Start

from dasein import Client

client = Client(api_key="dsk_...")  # get a free key at https://api.daseinai.ai/auth/github

# Create a hybrid index (semantic + keyword search)
index = client.create_index("my-docs", index_type="hybrid", model="bge-large-en-v1.5")

# Upsert documents — metadata values can be strings, ints, or floats
index.upsert([
    {"id": "doc1", "text": "SpaceX launched Starship on its 5th test flight",
     "metadata": {"source": "reuters", "category": "space", "year": 2025, "priority": 1}},
    {"id": "doc2", "text": "GPT-5 achieves superhuman reasoning on ARC-AGI",
     "metadata": {"source": "arxiv", "category": "ai", "year": 2025, "priority": 2}},
    {"id": "doc3", "text": "Fed holds rates steady amid cooling inflation",
     "metadata": {"source": "bloomberg", "category": "finance", "year": 2025, "priority": 3}},
    {"id": "doc4", "text": "Python 3.13 ships with a JIT compiler",
     "metadata": {"source": "pep", "category": "code", "year": 2024, "priority": 1}},
])

# Hybrid search — semantic similarity + BM25 keyword matching
results = index.query("what is machine learning?", top_k=5, mode="hybrid")

# Filter by metadata — all operators are true pre-filters (no recall penalty)
results = index.query("recent breakthroughs", top_k=5, filter={"year": {"$gte": 2025}}, include_metadata=True)
results = index.query("top stories", top_k=5, filter={"source": {"$in": ["reuters", "bloomberg"]}, "priority": 1}, include_metadata=True)
results = index.query("tech news", top_k=5, filter={"$or": [{"category": "ai"}, {"category": "code"}]}, include_metadata=True)

for r in results:
    print(f"{r.id}: {r.score:.4f} — {r.metadata}")

Choosing an Index Type

You choose the index type at creation time. This determines what search modes are available.

`index_type`	What it builds	Query modes available
`"hybrid"`	Dense vectors + BM25 inverted index	`mode="hybrid"` and `mode="dense"`
`"dense"`	Dense vectors only	`mode="dense"` only

Use "hybrid" unless you have a reason not to. Hybrid indexes support both dense and hybrid queries — you choose per query. Dense indexes are smaller in RAM but cannot use keyword search.

# Hybrid index — supports both query modes
index = client.create_index("my-docs", index_type="hybrid", model="bge-large-en-v1.5")

# Dense-only index — only supports mode="dense"
index = client.create_index("my-docs", index_type="dense", model="bge-large-en-v1.5")

Agentic Search

Some queries can't be served by a single similarity match. "What 2010 dream-heist movie was directed by the filmmaker who made the space wormhole movie starring the actor who played the 'Alright, alright, alright' guy in Dazed and Confused?" needs a chain of retrievals — first narrow to the actor, then to the wormhole movie, then to its director, then to their 2010 dream-heist movie.

Set agentic_search=True on index.query() and Dasein runs the whole chain server-side — sub-question decomposition, 3–5 hops of retrieval against your own index, per-hop refinement — and returns the final-hop ranking. Same response shape as a normal query: a ranked list of documents.

response = index.query(
    "What 2010 dream-heist movie was directed by the filmmaker who made the "
    "space wormhole movie starring the actor who played the 'Alright, alright, "
    "alright' guy in Dazed and Confused?",
    top_k=10,
    agentic_search=True,
)

# Same shape as index.query() — iterate it like any other ranking.
for r in response.results:
    print(r.id, r.score, r.metadata)

This is a retrieval system, not a QA / RAG chatbot. The deliverable is the ranking, the same shape index.query() always returns. The chain reasoning happens internally to produce a better final ranking; you get a list of documents.

If you also want the reader's one-line answer alongside the ranking, opt in with include_answer=True:

response = index.query("...", agentic_search=True, include_answer=True)
print(response.final_answer)    # e.g. "Inception" (off by default)
for r in response.results:
    print(r.id, r.score)

Pass return_hops=True if you want the per-hop trace for debugging or UI:

response = index.query("...", agentic_search=True, return_hops=True)
print(response.chain)           # the sub-question templates the system ran
print(response.n_hops)          # number of hops actually executed
for h in response.hops:         # per-hop sub-q text, fused hits, timings
    print(h["hop"], h["sub_query_text"], len(h["fused_ids"]))

All your retrieval settings still apply, on every hop. Each hop runs the retrieval mode you configured at the call site:

# multi-hop with hybrid α=0.7 on every hop (dense-leaning; alpha=1.0 is pure dense)
index.query("...", agentic_search=True, mode="hybrid", alpha=0.7)

# multi-hop with managed dynamic hybrid on every hop
index.query("...", agentic_search=True, mode="hybrid", dynamic_hybrid=True)

# multi-hop with managed Dynamic Top-K on every hop (per-hop K cutoff)
index.query("...", agentic_search=True, mode="hybrid",
            dynamic_hybrid=True, dynamic_top_k=True)

# multi-hop with metadata pre-filter on every hop
index.query("...", agentic_search=True, filter={"year": {"$gte": 2010}})

# multi-hop with BM25 phrase + fuzzy modifiers on every hop
index.query("...", agentic_search=True, mode="hybrid", phrase=True, fuzzy=True)

Requires text=... (vector-only is not supported — sub-question decomposition needs the natural-language question). End-to-end latency on the live demo is typically ~1 s for a warm 3–4 hop run.

Hybrid Search

Hybrid indexes support per-query toggling between dense-only and hybrid retrieval — no reindexing, no separate BM25 pipeline.

# Dense: pure semantic similarity
results = index.query("financial derivatives risk models", top_k=10, mode="dense")

# Hybrid: semantic + BM25 keyword matching, fused and re-ranked
results = index.query("AAPL earnings Q3 2025", top_k=10, mode="hybrid")

# Exact keyword matching — only docs that contain all your terms
results = index.query("AAPL earnings Q3 2025", top_k=10, mode="hybrid", exact=True)

# Phrase matching — only docs containing "machine learning" as an exact phrase
results = index.query("machine learning", top_k=10, mode="hybrid", phrase=True)

# Fuzzy matching — handles typos (edit distance 1)
results = index.query("machin lerning", top_k=10, mode="hybrid", fuzzy=True)

# Tune the dense vs BM25 balance (1.0 = all dense, 0.0 = all BM25, default 0.5)
# Pinecone / Weaviate convention: alpha is the dense weight.
results = index.query("AAPL earnings", top_k=10, mode="hybrid", alpha=0.3)  # lean keyword-heavy

Hybrid mode is strongest on queries with specific keywords, entity names, or codes where pure semantic search loses signal. Dense mode is better for abstract, conceptual queries. You choose per query. The keyword features (exact, phrase, fuzzy) refine hybrid results — use them when you need precise keyword control. The alpha parameter lets you tune the balance between dense and BM25 ranking in the fusion step.

Dynamic Hybrid

Tuning alpha per query is tedious and fragile. On hybrid indexes you can hand the decision off to Dasein:

results = index.query("AAPL earnings Q3 2025",
                      top_k=10, mode="hybrid", dynamic_hybrid=True)

With dynamic_hybrid=True, Dasein picks the dense/BM25 balance for each query individually and returns the final ranking directly — alpha is ignored. Only available on hybrid indexes. top_k must be <= 100.

No extra setup. No retraining on your data. Works across encoders.

If you run your own dense + BM25 pipeline, the Dasein-native α plus per-query K cutoffs are available as a single managed call — client.predict_dynamic — described below.

Dynamic Top-K

A fixed top_k=10 is a worst-case budget. Easy queries don't need 10 results — the gold lands at rank 1. Hard queries occasionally do. Sending the same 10 to a downstream LLM, cross-encoder, or UI on every query overpays on the easy ones and underdelivers on the hard ones.

Dynamic Top-K predicts the smallest top-K of the alpha-fused ranking that still retains the gold, per query. Easy queries return 1–3 results; harder queries get up to the full budget (currently 10). Your top_k stays a hard ceiling — Dasein only ever clips down.

There are two surfaces, picked by where retrieval happens.

On Dasein indexes — managed toggle

Pair dynamic_top_k=True with dynamic_hybrid=True on any hybrid index. The cutoff is applied server-side; you get the trimmed ranking back:

# Single-hop. K_pred bounds top_k tighter than the 10 you asked for
# on easy queries, leaves you with 10 on hard ones.
results = index.query(
    "AAPL earnings Q3 2025",
    top_k=10, mode="hybrid",
    dynamic_hybrid=True, dynamic_top_k=True,
)

# Agentic. Cutoff applies to *every hop* — slashes per-hop reader/LLM
# token spend on easy sub-questions while preserving recall on hard
# ones. Same shape as any other agentic query.
response = index.query(
    "What 2010 dream-heist movie was directed by ...",
    top_k=10, agentic_search=True,
    dynamic_hybrid=True, dynamic_top_k=True,
)

dynamic_top_k=True requires dynamic_hybrid=True — the cutoff was trained on the alpha-fused ranking and only makes sense applied to it.

BYO retrieval stack — `client.predict_dynamic`

If you run your own dense + BM25 pipeline, hit Dasein for the per-query α + K_dense + K_hybrid in one call. One HTTP round-trip, one GPU forward, three scalars — same model that powers the managed dynamic_hybrid and dynamic_top_k toggles. Pass the same dense query vector you're about to retrieve with so the prediction matches your encoder's geometry:

# embed the query in YOUR encoder's space (whatever you already use)
qvec = my_encoder.encode("who founded apple?")

p = client.predict_dynamic("who founded apple?", query_vector=qvec)

# hybrid stack — fuse with α, clip to K_hybrid
fused = rrf_fuse(my_dense_hits, my_bm25_hits, alpha=p.alpha)
kept  = fused[: p.top_k_hybrid]

# pure-dense stack — ignore α, clip to K_dense
kept_dense = my_dense_hits[: p.top_k_dense]

Returns a DynamicPrediction(alpha, top_k_dense, top_k_hybrid):

alpha is a float in [0.0, 1.0] — 1.0 = all dense, 0.0 = all BM25 (Pinecone / Weaviate convention). Drop into your RRF / convex combination as the dense weight.
top_k_dense and top_k_hybrid are integers in [1, 10] — upper-bound suggestions you clip to (min(your_top_k, p.top_k_*)).

Works across encoders. query_vector is strongly recommended — if you omit it, Dasein embeds text with its default model and the prediction is tied to that model's geometry, which won't line up with your own retriever.

Free plans: 1,000 calls per month. Paid hybrid plans: unlimited.

Metadata

Attach key-value metadata to documents for filtering at query time. Values can be strings, integers, or floats.

index.upsert([
    {
        "id": "doc1",
        "text": "SpaceX launched Starship",
        "metadata": {
            "source": "reuters",
            "category": "space",
            "year": 2025,
            "priority": 1,
            "rating": 9.2,
        },
    },
])

# Simple equality
results = index.query("rocket launch", top_k=10, filter={"source": "reuters"})
results = index.query("rocket launch", top_k=10, filter={"category": "space", "year": 2025})

Filtering

Filters are true pre-filters — candidates that don't match are never touched. No recall penalty.

# Equality (default — bare values are $eq)
filter={"genre": "sci-fi"}
filter={"genre": {"$eq": "sci-fi"}}  # equivalent explicit form

# Not equal
filter={"status": {"$ne": "archived"}}

# In set
filter={"category": {"$in": ["ai", "finance", "health"]}}

# Not in set
filter={"source": {"$nin": ["spam", "test"]}}

# Exists / not exists
filter={"author": {"$exists": True}}

# Numeric range
filter={"year": {"$gte": 2020, "$lte": 2025}}
filter={"rating": {"$gt": 7.5}}

# OR across keys
filter={"$or": [{"category": "ai"}, {"priority": 1}]}

# Combine (AND by default)
filter={"source": "reuters", "year": {"$gte": 2024}, "category": {"$in": ["tech", "science"]}}

All filter operators work with both dense and hybrid queries. Pass include_metadata=True to return metadata with results.

Get an API Key

Web: Sign up with GitHub at api.daseinai.ai/auth/github — no credit card required. You'll get an API key instantly.

CLI / Agents:

import httpx, time

resp = httpx.post("https://api.daseinai.ai/auth/device/start").json()
print(f"Go to {resp['verification_uri']} and enter code: {resp['user_code']}")

while True:
    time.sleep(resp.get("interval", 5))
    poll = httpx.post(
        "https://api.daseinai.ai/auth/device/poll",
        json={"device_code": resp["device_code"]},
    ).json()
    if poll.get("api_key"):
        print(f"API key: {poll['api_key']}")
        break

Features

Agentic Search — agentic_search=True on index.query() runs a managed multi-hop pipeline against your own index: sub-question decomposition, 3–5 retrieval hops, intermediate reading, in ~1.0–1.6 s. All your retrieval settings (mode, alpha, Dynamic Hybrid, Dynamic Top-K, filter, BM25 modifiers) apply to every hop.

Dynamic Top-K — dynamic_top_k=True (alongside dynamic_hybrid=True) trims the result set per query — 1–3 results on easy queries, full budget on hard ones. Drops downstream LLM token spend without giving up recall. Works under agentic_search=True for compounding savings on every hop.

Dynamic Hybrid — dynamic_hybrid=True lets Dasein pick the dense/BM25 α per query on hybrid indexes. For BYO retrieval stacks, client.predict_dynamic(text, query_vector=...) returns the same per-query α plus the matched K cutoffs in one call. Across encoders, no retraining.

Hybrid Search — Switch between dense and hybrid retrieval per query. No reindexing, no separate BM25 infrastructure.

Managed embedding — Pass raw text, we embed with open-source models (BGE, Nomic, E5, GTE). No embedding infrastructure to manage.

Bring your own vectors — Already have embeddings? Pass them directly with any dimension.

Metadata filtering — Attach metadata to documents and filter at query time with operators like $in, $ne, $gte, $lte, and $or. True pre-filters with no recall penalty.

Automatic retries — The SDK retries with exponential backoff:

Error	Read / query	Upsert	Build / delete
429 (rate limit)	Retried (up to `max_retries`)	Retried	Retried
503 (transient)	Retried	Retried (upserts are idempotent by doc ID)	Not retried
504 (gateway timeout)	Retried	Retried	Not retried
Connection error	Retried	Retried	Not retried

Embedding Models

Model	Dimensions	Matryoshka dims	Notes
`bge-large-en-v1.5`	1024	512, 256, 128, 64	Strong general-purpose English model
`nomic-embed-text-v1.5`	768	512, 384, 256, 128, 64	Good balance of speed and quality
`e5-large-v2`	1024	—	Microsoft's E5 family (no MRL support)
`gte-large-en-v1.5`	1024	512, 256, 128, 64	Alibaba's GTE family

Or skip the model parameter and pass your own vectors of any dimension.

Matryoshka Dimension Truncation

Models trained with Matryoshka Representation Learning (MRL) can be truncated to lower dimensions with minimal recall loss, cutting RAM and storage proportionally. Pass dim at index creation:

index = client.create_index("my-docs", index_type="hybrid", model="bge-large-en-v1.5", dim=256)

Embeddings are generated at full dimension and truncated + L2-renormalized before indexing. Queries are truncated the same way automatically. The first build at a non-native dimension takes slightly longer than a build at the model's native dimension.

API Reference

Client

from dasein import Client

client = Client(
    api_key="dsk_...",       # required
    base_url=None,           # override API URL (default: Dasein Cloud)
    timeout=30.0,            # request timeout in seconds
    max_retries=3,           # retries on 429/503
)

Create Index

index = client.create_index(
    name="my-index",
    index_type="hybrid",             # REQUIRED CHOICE: "dense" or "hybrid"
    model="bge-large-en-v1.5",      # None for bring-your-own-vectors
    dim=None,                        # truncate to lower dim for MRL models (e.g., 256)
)

index_type determines what search capabilities the index has:

"hybrid" — builds both a dense vector index and a BM25 inverted index. Supports mode="dense" and mode="hybrid" queries.
"dense" — builds a dense vector index only. Supports mode="dense" queries only.

List Indexes

indexes = client.list_indexes()
for idx in indexes:
    print(idx["index_id"], idx["name"], idx["status"], idx["vector_count"])

Get Existing Index

index = client.get_index("index_id")

Delete Index

client.delete_index("index_id")

Predict Dynamic (per-query retrieval plan)

p = client.predict_dynamic(
    text="who founded apple?",
    query_vector=my_dense_vec,  # strongly recommended: your encoder's output
    model_id=None,              # used to embed `text` ONLY if query_vector is None
)
# p.alpha          : float in [0.0, 1.0]  (1=dense, 0=BM25)
# p.top_k_dense    : int   in [1, 10]
# p.top_k_hybrid   : int   in [1, 10]

Returns a DynamicPrediction(alpha, top_k_dense, top_k_hybrid) — one HTTP call, one GPU forward, three scalars. Drop alpha into your RRF / convex combination as the dense weight; clip your candidate set to top_k_dense (pure-dense stack) or top_k_hybrid (hybrid stack). No Dasein index required.

Pass your own query_vector so the prediction is valid for your retriever. If omitted, Dasein embeds text with its default encoder and the result will only be meaningful for that encoder's geometry.

See BYO retrieval stack — client.predict_dynamic for usage and quotas.

Cross-Index Query Batch

responses = client.query_batch([
    {"index_id": "abc", "text":   "hello",       "top_k": 10},
    {"index_id": "def", "vector": my_vec,        "top_k": 5, "include_vectors": True},
    {"index_id": "ghi", "text":   "rate limit",  "top_k": 20, "mode": "hybrid"},
])

See Query Batch below for the full feature surface, per-slot error semantics, and limits.

Upsert Documents

index.upsert([
    {"id": "doc1", "text": "Hello world", "metadata": {"type": "greeting"}},
    {"id": "doc2", "text": "Goodbye world", "metadata": {"type": "farewell"}},
])

Each document can have:

id (required) — unique document ID (string or int)
text — raw text (embedded automatically if the index has a model)
vector — pre-computed embedding (list of floats)
metadata — dict[str, str | int | float] for filtering

Max 5,000 documents per call for model-backed indexes (10,000 for bring-your-own-vectors). The SDK automatically batches larger lists.

You can also use the typed UpsertItem class instead of raw dicts:

from dasein import UpsertItem

index.upsert([
    UpsertItem(id="doc1", text="Hello world", metadata={"type": "greeting"}),
    UpsertItem(id="doc2", vector=[0.1, 0.2, ...]),
])

Query

results = index.query(
    text="search query",         # or vector=[0.1, 0.2, ...]
    top_k=10,
    mode="hybrid",               # "dense" or "hybrid" (hybrid requires index_type="hybrid")
    filter={"key": "value"},     # optional metadata filter (supports operators — see Filtering)
    exact=False,                 # exact keyword matching (hybrid only)
    phrase=False,                # exact phrase matching (hybrid only)
    fuzzy=False,                 # typo-tolerant matching (hybrid only)
    alpha=0.5,                   # dense vs BM25 balance (1=dense, 0=BM25)
    dynamic_hybrid=False,        # let Dasein pick alpha per query (hybrid only, top_k<=100)
    include_text=False,          # return stored text (off by default)
    include_metadata=False,      # return stored metadata (off by default)
    include_vectors=False,       # return approximate vectors (off by default)
)

What you get back depends on your settings:

Setting	Returns	I/O cost
Default	`id`, `score`	Zero — pure RAM, no SSD reads
`include_metadata=True`	+ `metadata`	Small SSD read per result (page-cached for hot indexes)
`include_text=True`	+ `text`	Larger SSD read per result
`include_vectors=True`	+ `vector`	Zero — reconstructed from RAM (approximate)

# Default — IDs and scores only, pure RAM, maximum QPS
results = index.query("quarterly earnings", top_k=10)
for r in results:
    print(r.id, r.score)

# Include metadata
results = index.query("quarterly earnings", top_k=10, include_metadata=True)
for r in results:
    print(r.id, r.score, r.metadata)

# Full hydration — metadata + original text
results = index.query("quarterly earnings", top_k=10, include_text=True, include_metadata=True)
for r in results:
    print(r.id, r.score, r.text, r.metadata)

# Include approximate vectors (reconstructed from RAM, no disk I/O)
results = index.query("quarterly earnings", top_k=10, include_vectors=True)
for r in results:
    # r.vector is a numpy.ndarray (float32) when numpy is installed,
    # or a list[float] otherwise. np.asarray(r.vector) works for both.
    print(r.id, r.score, len(r.vector))

Text and metadata are stored on SSD and only fetched when you opt in. Vectors are reconstructed from a compact in-RAM representation — no disk I/O. The default query path is entirely RAM-resident.

Wire format for `include_vectors`

When numpy is available, the SDK automatically asks the server for vectors as base64-encoded little-endian float32 bytes, then decodes them with np.frombuffer outside the GIL. This avoids allocating thousands of Python float objects per response and is the path that unlocks high throughput under concurrent use. If numpy isn't installed, the SDK falls back to the legacy JSON-array-of-floats wire format transparently.

Query Batch

For workloads that run many queries back-to-back — training loops, evaluation suites, mining — use batch queries to amortize HTTP / TLS / router overhead across a single round-trip. Two flavors:

index.query_batch(queries) — many queries, one index.
client.query_batch(queries) — many queries, many indexes in one request.

Both return list[QueryResponse] in request order, both accept the full set of query() kwargs per entry, and both cap out at 4096 sub-queries per call.

Single-index: `index.query_batch`

# Each entry takes the same keys as Index.query(...)
batch = [
    {"vector": v, "top_k": 10, "include_vectors": True}
    for v in my_query_vectors
]  # up to 4096
responses = index.query_batch(batch)

for q_idx, resp in enumerate(responses):
    for r in resp:
        print(q_idx, r.id, r.score)

index.query_batch is functionally identical to calling query() N times — same server-side search path, same hybrid RRF fusion, same filter operators, same flags. The only difference is that many queries travel on one TCP connection in one JSON payload. You can mix dense and hybrid queries, different top_k, different filter, different include_* choices in the same batch.

# Every key that query() takes works inside query_batch() entries:
batch = [
    {"text": "rocket launch",    "top_k": 5,  "mode": "hybrid"},
    {"text": "quarterly earnings","top_k": 10, "filter": {"year": {"$gte": 2024}},
     "include_metadata": True},
    {"vector": my_vec,           "top_k": 20, "include_vectors": True},
]
responses = index.query_batch(batch)

Multi-index: `client.query_batch`

client.query_batch takes a list where each entry carries its own index_id and fans out across every index it touches inside the router. Same feature surface as Index.query() per entry — text / vector, dense / hybrid, filters, exact / phrase / fuzzy / alpha, include_text / include_metadata / include_vectors.

# 256 queries scattered across many indexes in one round-trip.
batch = []
for idx_id, qvec in zip(index_ids, query_vectors):
    batch.append({
        "index_id":        idx_id,
        "vector":          qvec,
        "top_k":           10,
        "mode":            "hybrid",
        "include_vectors": True,
    })
responses = client.query_batch(batch)

for sent, resp in zip(batch, responses):
    if resp.error:              # per-slot failure — doesn't fail the batch
        print(sent["index_id"], "FAILED:", resp.error)
        continue
    for r in resp:
        print(sent["index_id"], r.id, r.score, r.vector[:4])

Text auto-embeds just like query() — the router looks up each index's model, coalesces all sub-queries that share a model into one embed call, and splices the resulting vectors back into their slots. A batch of 256 texts across 60 indexes that all use bge-large-en-v1.5 costs one embed round-trip, not 256.

batch = [
    {"index_id": "abc-001", "text": "climate policy", "top_k": 10, "mode": "hybrid"},
    {"index_id": "def-002", "text": "interest rates", "top_k": 5,  "mode": "dense"},
    {"index_id": "ghi-003", "vector": pre_embedded,   "top_k": 20, "include_vectors": True},
]
responses = client.query_batch(batch)

Under the hood the router:

Authenticates the API key once, then checks per-slot authorization against each index_id.
Groups text sub-queries by the index's model_id and fires one parallel /embed call per distinct model.
Groups every (index_id, host) pair into a bucket and fires one pod-side /batch_query per bucket — in parallel across up to 24 pods.
Reassembles the response in original slot order.

Per-slot errors (multi-index only)

Multi-index batches never throw for one bad slot — the whole batch always comes back 200 if the envelope made it. Bad slots come back as an empty results list with resp.error set:

`resp.error`	Meaning
`"missing index_id"` / `"missing text or vector"`	Malformed entry
`"invalid api key"` / `"api key not authorized for this index"`	Auth failure
`"index not loaded"`	Index is placing/migrating or unknown
`"embed failed"` / `"embed service not configured"`	Embed path failure
`"backend error (status N)"`	Pod returned non-2xx

Single-index index.query_batch uses the same per-slot model — malformed sub-queries come back as empty result sets without error set.

Response ordering always matches request ordering: responses[i] corresponds to batch[i].

Limits

Max 4096 queries per call (either flavor).
Request body capped at 16 MiB on the router's inbound side. With 1024-dim JSON-encoded query vectors, that's roughly 1500 queries per batch before you need to split; bring-your-own-vector batches at smaller top_k and fewer include_* can stretch further.
Multi-index: one batch can span up to 1024 distinct (index_id, host) buckets and up to 32 distinct embedding models.

Optional: faster JSON parsing

If orjson is importable the SDK will use it for query / query_batch response parsing automatically. It's strictly optional — no changes to your code — but installing it noticeably reduces CPU on the query hot path, especially for large batch responses:

pip install orjson

Delete Documents

index.delete(["doc1", "doc2"])

Upsert and Wait

result = index.upsert_and_wait(documents, timeout=120.0)

Upserts documents and polls until the index becomes queryable. Useful for scripts where you want to upsert and immediately query.

Build (BYOV only)

index.build()

Only needed for bring-your-own-vectors with unrecognized models. Known-model indexes build automatically after the first upsert.

Compact

index.compact()

Triggers a compaction rebuild that removes deleted document tombstones from the graph. Run this after large batch deletions to reclaim performance.

Index Status

info = index.status()
print(info.status)        # created, building, built, active, etc.
print(info.vector_count)

Exceptions

from dasein.exceptions import (
    DaseinError,             # base — catch-all for any Dasein error, including plain 403 Forbidden
    DaseinAuthError,         # 401, or 403 mentioning credentials / API key / revoked
    DaseinQuotaError,        # 403 — billing/plan/trial/subscription/embed limit
    DaseinNotFoundError,     # 404 — index doesn't exist
    DaseinRateLimitError,    # 429 — transient rate limit exceeded (has retry_after)
    DaseinUnavailableError,  # 503/504 — service temporarily unavailable (has retry_after)
    DaseinBuildError,        # build failed
)

DaseinAuthError is raised only for credential issues (bad API key, revoked key, authentication failure). DaseinQuotaError covers trial limits, plan vector caps, expired/past-due subscriptions, and embed token quotas (including 429s that indicate a non-transient monthly embed cap). DaseinRateLimitError is raised for transient per-second rate limits that the SDK retries automatically. A generic 403 (e.g., accessing a resource you don't own) raises DaseinError — catch it separately if you need to distinguish resource authorization from credential errors.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.17

May 11, 2026

This version

0.4.16

May 11, 2026

0.4.15

May 11, 2026

0.4.14

May 11, 2026

0.4.13

May 11, 2026

0.4.12

May 11, 2026

0.4.11

May 11, 2026

0.4.10

May 9, 2026

0.4.9

Apr 30, 2026

0.4.8

Apr 30, 2026

0.4.7

Apr 21, 2026

0.4.6

Apr 20, 2026

0.4.5

Apr 20, 2026

0.4.4

Apr 18, 2026

0.4.3

Apr 18, 2026

0.4.2

Apr 14, 2026

0.4.1

Apr 13, 2026

0.4.0

Apr 12, 2026

0.3.1

Apr 6, 2026

0.3.0

Apr 6, 2026

0.2.1

Apr 6, 2026

0.2.0

Apr 6, 2026

0.1.1

Apr 5, 2026

0.1.0

Apr 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dasein_ai-0.4.16.tar.gz (61.7 kB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dasein_ai-0.4.16-py3-none-any.whl (34.3 kB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file dasein_ai-0.4.16.tar.gz.

File metadata

Download URL: dasein_ai-0.4.16.tar.gz
Upload date: May 11, 2026
Size: 61.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for dasein_ai-0.4.16.tar.gz
Algorithm	Hash digest
SHA256	`47046b1c78c50b8c793414609ed92f69e76151cbcc7df50f4fdfdecc5a2b6f1a`
MD5	`e4c2d3794c3b6b5070efb65e8046e2cb`
BLAKE2b-256	`51ad739fd10905e39b2508d46109bcf6ee8d1713106433012df2084d1a0533fb`

See more details on using hashes here.

File details

Details for the file dasein_ai-0.4.16-py3-none-any.whl.

File metadata

Download URL: dasein_ai-0.4.16-py3-none-any.whl
Upload date: May 11, 2026
Size: 34.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for dasein_ai-0.4.16-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d516f8b53a213ef773c15b21c3078fea66e7c0b025bde9e47c8262f554ae595a`
MD5	`dd6afdf1addf9e0de6464c34d3a81fbc`
BLAKE2b-256	`0fe2430e33744076249b3179f8e45850a8b4ff8a40579664ed425a42cb2e66e7`

See more details on using hashes here.

dasein-ai 0.4.16

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Dasein

Install

Quick Start

Choosing an Index Type

Agentic Search

Hybrid Search

Dynamic Hybrid

Dynamic Top-K

On Dasein indexes — managed toggle

BYO retrieval stack — client.predict_dynamic

Metadata

Filtering

Get an API Key

Features

Embedding Models

Matryoshka Dimension Truncation

API Reference

Client

Create Index

List Indexes

Get Existing Index

Delete Index

Predict Dynamic (per-query retrieval plan)

Cross-Index Query Batch

Upsert Documents

Query

Wire format for include_vectors

Query Batch

Single-index: index.query_batch

Multi-index: client.query_batch

Per-slot errors (multi-index only)

Limits

Optional: faster JSON parsing

Delete Documents

Upsert and Wait

Build (BYOV only)

Compact

Index Status

Exceptions

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

BYO retrieval stack — `client.predict_dynamic`

Wire format for `include_vectors`

Single-index: `index.query_batch`

Multi-index: `client.query_batch`