BabyAPI client.
Project description
BabyAPI (Python SDK)
A tiny Python client for BabyAPI — an OpenAI-compatible API for hosted open-weight models.
Minimal surface area. Calm defaults. You bring an API key — we handle the GPUs.
Endpoints
- OpenAI-compatible:
POST /v1/chat/completionsPOST /v1/completionsPOST /v1/embeddingsPOST /v1/rerank
- BabyAPI convenience:
POST /infer(simple text-in, text-out)
Install
pip install babyapi
Quick start (the easy path): client.baby.infer(...)
If you just want text in → text out, start here.
import os
from babyapi import BabyAPI
client = BabyAPI(
api_key=os.getenv("BABYAPI_API_KEY"),
default_model="mistral", # so you can call baby.infer("...") without specifying model
)
out = client.baby.infer(
{
"prompt": "Write a 1-line release note title for BabyAPI.",
"maxTokens": 40,
"temperature": 0.5,
}
)
print(out["output"])
print(out.get("usage"))
You can also pass a raw string:
out = client.baby.infer("Explain BabyAPI in one sentence.")
print(out["output"])
Supported options (aliases accepted)
You can pass options directly or inside "options": {...}:
max_tokens/maxTokenstemperaturetop_p/topPtop_k/topKstoppresence_penalty/presencePenaltyfrequency_penalty/frequencyPenalty
Example with aliases + nested options:
out = client.baby.infer(
{
"model": "mistral",
"prompt": "Give 3 calm API principles.",
"options": {"topP": 0.9, "max_tokens": 80},
}
)
print(out["output"])
One method for both OpenAI endpoints: client.infer(...)
If you want “do the right thing” with OpenAI-style payloads:
- If you pass
messages→ routes to chat completions - If you pass
prompt→ routes to completions
chat_res = client.infer(
{
"model": "mistral",
"messages": [{"role": "user", "content": "One-line slogan for BabyAPI?"}],
}
)
print(chat_res["choices"][0]["message"]["content"])
comp_res = client.infer(
model="mistral",
prompt="Give 3 product names for a tiny LLM SDK.",
max_tokens=60,
)
print(comp_res["choices"][0]["text"])
OpenAI-compatible: Chat Completions
res = client.chat.completions.create(
model="mixtral",
messages=[
{"role": "system", "content": "You are concise."},
{"role": "user", "content": "Give me 3 tagline ideas for a tiny LLM API."},
],
temperature=0.7,
)
print(res["choices"][0]["message"]["content"])
OpenAI-compatible: Completions
res = client.completions.create(
model="mistral",
prompt="Write a friendly release note opener for BabyAPI.",
max_tokens=120,
temperature=0.7,
)
print(res["choices"][0]["text"])
OpenAI-compatible: Embeddings
res = client.embeddings.create(
model="qwen3-embedding",
input="BabyAPI makes LLMs easy.",
)
print(res["data"][0]["embedding"][:5]) # first 5 dimensions
print(res["usage"])
You can also embed multiple texts at once:
res = client.embeddings.create(
model="qwen3-embedding",
input=[
"First document to embed.",
"Second document to embed.",
],
)
for item in res["data"]:
print(f"Index {item['index']}: {len(item['embedding'])} dimensions")
Supported parameters
| Parameter | Type | Description |
|---|---|---|
model |
str |
Required. The embedding model to use. |
input |
str | list[str] |
Required. Text(s) to embed. |
encoding_format |
str |
Optional. "float" (default) or "base64". |
dimensions |
int |
Optional. Truncate embeddings to this many dimensions. |
truncate_prompt_tokens |
int |
Optional. Max tokens to keep (vLLM-specific). |
Reranking
res = client.rerank.create(
model="qwen3-reranker",
query="What is BabyAPI?",
documents=[
"BabyAPI is a tiny hosted LLM API.",
"The weather is nice today.",
"BabyAPI supports OpenAI-compatible endpoints.",
],
)
for result in res["results"]:
print(f"Index {result['index']}: relevance_score={result['relevance_score']:.4f}")
Supported parameters
| Parameter | Type | Description |
|---|---|---|
model |
str |
Required. The reranker model to use. |
query |
str |
Required. The query to rank documents against. |
documents |
list[str] |
Required. Documents to rerank. |
top_n |
int |
Optional. Return only the top N results. |
return_documents |
bool |
Optional. Include document text in results. |
truncate_prompt_tokens |
int |
Optional. Max tokens to keep (vLLM-specific). |
Streaming (SSE)
.stream(...) yields SSEEvent objects:
event.done→Truewhen the stream is finished ([DONE])event.data→ parsed JSON when possible (otherwiseNone)event.raw→ rawdata:payload string
Streaming: chat
import os
from babyapi import BabyAPI
client = BabyAPI(api_key=os.getenv("BABYAPI_API_KEY"))
for event in client.chat.completions.stream(
model="mistral",
messages=[{"role": "user", "content": "Write a short poem about servers."}],
):
if event.done:
break
delta = (event.data or {}).get("choices", [{}])[0].get("delta", {})
chunk = delta.get("content")
if chunk:
print(chunk, end="", flush=True)
print()
Streaming: completions
for event in client.completions.stream(
model="mistral",
prompt="List 5 calm API-building tips.",
):
if event.done:
break
text = (event.data or {}).get("choices", [{}])[0].get("text")
if text:
print(text, end="", flush=True)
print()
Note: like many SDKs, streaming requests are not retried. If you want retries for streams, wrap your call at the application level.
Multimodal (vision) examples (OpenAI-style)
If the model you select supports vision, you can send images using OpenAI-style message content.
Vision: non-streaming
res = client.chat.completions.create(
model="pixtral", # or another vision-capable model you expose
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Describe the image in 2 sentences. Then list 3 objects you see."},
{
"type": "image_url",
"image_url": {"url": "https://api.babyapi.org/images/banner.png"},
},
],
}
],
)
print(res["choices"][0]["message"]["content"])
Vision: streaming
for event in client.chat.completions.stream(
model="pixtral",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What is this image trying to communicate?"},
{"type": "image_url", "image_url": {"url": "https://api.babyapi.org/images/banner.png"}},
],
}
],
):
if event.done:
break
delta = (event.data or {}).get("choices", [{}])[0].get("delta", {})
chunk = delta.get("content")
if chunk:
print(chunk, end="", flush=True)
print()
Image support depends on the model you choose. If the model is text-only, the API may reject image inputs.
Configuration
import os
from babyapi import BabyAPI
client = BabyAPI(
api_key=os.getenv("BABYAPI_API_KEY"), # required (or BABY_API_KEY)
base_url=os.getenv("BABYAPI_BASE_URL"), # optional (default: https://api.babyapi.org)
timeout_s=60.0, # JSON requests only
max_retries=2, # retry transient failures
retry_base_delay_s=0.25, # exponential backoff base
default_model="mistral", # used by client.baby.infer when model omitted
default_headers={"x-app": "my-sideproject"}, # extra headers for every request
)
Environment variables supported:
BABYAPI_API_KEY(orBABY_API_KEY)BABYAPI_BASE_URLBABYAPI_DEFAULT_MODEL
Per-call overrides (RequestOptions)
Every .create(...) / .stream(...) accepts request_options.
import os
from babyapi import BabyAPI, RequestOptions
client = BabyAPI(api_key=os.getenv("BABYAPI_API_KEY"))
res = client.chat.completions.create(
request_options=RequestOptions(
timeout_s=30.0,
max_retries=0,
headers={"x-trace": "abc123"},
),
model="mistral",
messages=[{"role": "user", "content": "Hello."}],
)
You can also pass a plain dict:
res = client.chat.completions.create(
request_options={"timeout_s": 10.0, "headers": {"x-app": "demo"}},
model="mistral",
messages=[{"role": "user", "content": "Hi again."}],
)
Timeouts & cancellation
- JSON requests use
timeout_s(default: 60s). - Streaming requests default to no timeout (infinite), matching common SSE usage.
- If you want a stream timeout, pass
request_options={"timeout_s": 30.0}.
- If you want a stream timeout, pass
- To stop a stream early,
breakyour loop.
Errors
SDK errors raise BabyAPIError when possible.
import os
from babyapi import BabyAPI, BabyAPIError
client = BabyAPI(api_key=os.getenv("BABYAPI_API_KEY"))
try:
client.chat.completions.create(model="mistral", messages=[])
except BabyAPIError as err:
print(
{
"message": err.message,
"status": err.status,
"code": err.code,
"type": err.type,
"request_id": err.request_id,
}
)
Context manager / cleanup
The client maintains an httpx.Client. Use it as a context manager to ensure clean shutdown:
import os
from babyapi import BabyAPI
with BabyAPI(api_key=os.getenv("BABYAPI_API_KEY")) as client:
res = client.completions.create(model="mistral", prompt="Ping")
print(res["choices"][0]["text"])
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file babyapi-0.2.0.tar.gz.
File metadata
- Download URL: babyapi-0.2.0.tar.gz
- Upload date:
- Size: 9.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb6760f477343278de181929d44e8cc36185f7b3698eea8f993be617208e6a80
|
|
| MD5 |
371f2f6768693b3a209ec7c5a77e9b93
|
|
| BLAKE2b-256 |
b4f16df24a97f8c0b7247ea435b456697f0cead9ff4474ada17d6d529bd1d86a
|
File details
Details for the file babyapi-0.2.0-py3-none-any.whl.
File metadata
- Download URL: babyapi-0.2.0-py3-none-any.whl
- Upload date:
- Size: 11.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ed8bfcb016a152dab1cbc03594936cf519be2868197b925122a902125c7e8b7b
|
|
| MD5 |
657aad641f032308df8d8c8941e8d062
|
|
| BLAKE2b-256 |
1c9a038032f09722b27c20890f06e54384370ad3b34d02b224e00c97762b0258
|