BabyAPI client (OpenAI-compatible chat/completions/embeddings/rerank plus /docling document conversion).

These details have not been verified by PyPI

Project links

Project description

BabyAPI banner

BabyAPI (Python SDK)

A tiny Python client for BabyAPI — an OpenAI-compatible API for hosted open-weight models.

Minimal surface area. Calm defaults. You bring an API key — we handle the GPUs.

Endpoints

OpenAI-compatible:
- POST /v1/chat/completions
- POST /v1/completions
- POST /v1/embeddings
- POST /v1/rerank
BabyAPI convenience:
- POST /infer (simple text-in, text-out)
Document conversion (Docling):
- POST /docling/v1/convert/source · POST /docling/v1/convert/file
- Async variants + chunking (hybrid / hierarchical)

Install

pip install babyapi

Quick start (the easy path): `client.baby.infer(...)`

If you just want text in → text out, start here.

import os
from babyapi import BabyAPI

client = BabyAPI(
    api_key=os.getenv("BABYAPI_API_KEY"),
    default_model="mistral",  # so you can call baby.infer("...") without specifying model
)

out = client.baby.infer(
    {
        "prompt": "Write a 1-line release note title for BabyAPI.",
        "maxTokens": 40,
        "temperature": 0.5,
    }
)

print(out["output"])
print(out.get("usage"))

You can also pass a raw string:

out = client.baby.infer("Explain BabyAPI in one sentence.")
print(out["output"])

Supported options (aliases accepted)

You can pass options directly or inside "options": {...}:

max_tokens / maxTokens
temperature
top_p / topP
top_k / topK
stop
presence_penalty / presencePenalty
frequency_penalty / frequencyPenalty

Example with aliases + nested options:

out = client.baby.infer(
    {
        "model": "mistral",
        "prompt": "Give 3 calm API principles.",
        "options": {"topP": 0.9, "max_tokens": 80},
    }
)
print(out["output"])

One method for both OpenAI endpoints: `client.infer(...)`

If you want “do the right thing” with OpenAI-style payloads:

If you pass messages → routes to chat completions
If you pass prompt → routes to completions

chat_res = client.infer(
    {
        "model": "mistral",
        "messages": [{"role": "user", "content": "One-line slogan for BabyAPI?"}],
    }
)
print(chat_res["choices"][0]["message"]["content"])

comp_res = client.infer(
    model="mistral",
    prompt="Give 3 product names for a tiny LLM SDK.",
    max_tokens=60,
)
print(comp_res["choices"][0]["text"])

OpenAI-compatible: Chat Completions

res = client.chat.completions.create(
    model="mixtral",
    messages=[
        {"role": "system", "content": "You are concise."},
        {"role": "user", "content": "Give me 3 tagline ideas for a tiny LLM API."},
    ],
    temperature=0.7,
)

print(res["choices"][0]["message"]["content"])

OpenAI-compatible: Completions

res = client.completions.create(
    model="mistral",
    prompt="Write a friendly release note opener for BabyAPI.",
    max_tokens=120,
    temperature=0.7,
)

print(res["choices"][0]["text"])

OpenAI-compatible: Embeddings

res = client.embeddings.create(
    model="qwen3-embedding",
    input="BabyAPI makes LLMs easy.",
)

print(res["data"][0]["embedding"][:5])  # first 5 dimensions
print(res["usage"])

You can also embed multiple texts at once:

res = client.embeddings.create(
    model="qwen3-embedding",
    input=[
        "First document to embed.",
        "Second document to embed.",
    ],
)

for item in res["data"]:
    print(f"Index {item['index']}: {len(item['embedding'])} dimensions")

Supported parameters

Parameter	Type	Description
`model`	`str`	Required. The embedding model to use.
`input`	`str \| list[str]`	Required. Text(s) to embed.
`encoding_format`	`str`	Optional. `"float"` (default) or `"base64"`.
`dimensions`	`int`	Optional. Truncate embeddings to this many dimensions.
`truncate_prompt_tokens`	`int`	Optional. Max tokens to keep (vLLM-specific).

Reranking

res = client.rerank.create(
    model="qwen3-reranker",
    query="What is BabyAPI?",
    documents=[
        "BabyAPI is a tiny hosted LLM API.",
        "The weather is nice today.",
        "BabyAPI supports OpenAI-compatible endpoints.",
    ],
)

for result in res["results"]:
    print(f"Index {result['index']}: relevance_score={result['relevance_score']:.4f}")

Supported parameters

Parameter	Type	Description
`model`	`str`	Required. The reranker model to use.
`query`	`str`	Required. The query to rank documents against.
`documents`	`list[str]`	Required. Documents to rerank.
`top_n`	`int`	Optional. Return only the top N results.
`return_documents`	`bool`	Optional. Include document text in results.
`truncate_prompt_tokens`	`int`	Optional. Max tokens to keep (vLLM-specific).

Streaming (SSE)

.stream(...) yields SSEEvent objects:

event.done → True when the stream is finished ([DONE])
event.data → parsed JSON when possible (otherwise None)
event.raw → raw data: payload string

Streaming: chat

import os
from babyapi import BabyAPI

client = BabyAPI(api_key=os.getenv("BABYAPI_API_KEY"))

for event in client.chat.completions.stream(
    model="mistral",
    messages=[{"role": "user", "content": "Write a short poem about servers."}],
):
    if event.done:
        break

    delta = (event.data or {}).get("choices", [{}])[0].get("delta", {})
    chunk = delta.get("content")
    if chunk:
        print(chunk, end="", flush=True)

print()

Streaming: completions

for event in client.completions.stream(
    model="mistral",
    prompt="List 5 calm API-building tips.",
):
    if event.done:
        break

    text = (event.data or {}).get("choices", [{}])[0].get("text")
    if text:
        print(text, end="", flush=True)

print()

Note: like many SDKs, streaming requests are not retried. If you want retries for streams, wrap your call at the application level.

Multimodal (vision) examples (OpenAI-style)

If the model you select supports vision, you can send images using OpenAI-style message content.

Vision: non-streaming

res = client.chat.completions.create(
    model="pixtral",  # or another vision-capable model you expose
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe the image in 2 sentences. Then list 3 objects you see."},
                {
                    "type": "image_url",
                    "image_url": {"url": "https://api.babyapi.org/images/banner.png"},
                },
            ],
        }
    ],
)

print(res["choices"][0]["message"]["content"])

Vision: streaming

for event in client.chat.completions.stream(
    model="pixtral",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What is this image trying to communicate?"},
                {"type": "image_url", "image_url": {"url": "https://api.babyapi.org/images/banner.png"}},
            ],
        }
    ],
):
    if event.done:
        break

    delta = (event.data or {}).get("choices", [{}])[0].get("delta", {})
    chunk = delta.get("content")
    if chunk:
        print(chunk, end="", flush=True)

print()

Image support depends on the model you choose. If the model is text-only, the API may reject image inputs.

Configuration

import os
from babyapi import BabyAPI

client = BabyAPI(
    api_key=os.getenv("BABYAPI_API_KEY"),          # required (or BABY_API_KEY)
    base_url=os.getenv("BABYAPI_BASE_URL"),        # optional (default: https://api.babyapi.org)
    timeout_s=60.0,                                # JSON requests only
    max_retries=2,                                 # retry transient failures
    retry_base_delay_s=0.25,                       # exponential backoff base
    default_model="mistral",                       # used by client.baby.infer when model omitted
    default_headers={"x-app": "my-sideproject"},   # extra headers for every request
)

Environment variables supported:

BABYAPI_API_KEY (or BABY_API_KEY)
BABYAPI_BASE_URL
BABYAPI_DEFAULT_MODEL

Per-call overrides (RequestOptions)

Every .create(...) / .stream(...) accepts request_options.

import os
from babyapi import BabyAPI, RequestOptions

client = BabyAPI(api_key=os.getenv("BABYAPI_API_KEY"))

res = client.chat.completions.create(
    request_options=RequestOptions(
        timeout_s=30.0,
        max_retries=0,
        headers={"x-trace": "abc123"},
    ),
    model="mistral",
    messages=[{"role": "user", "content": "Hello."}],
)

You can also pass a plain dict:

res = client.chat.completions.create(
    request_options={"timeout_s": 10.0, "headers": {"x-app": "demo"}},
    model="mistral",
    messages=[{"role": "user", "content": "Hi again."}],
)

Timeouts & cancellation

JSON requests use timeout_s (default: 60s).
Streaming requests default to no timeout (infinite), matching common SSE usage.
- If you want a stream timeout, pass request_options={"timeout_s": 30.0}.
To stop a stream early, break your loop.

Docling: document conversion & chunking

The client.docling.* namespace wraps BabyAPI's Docling proxy. Use it to convert PDFs / DOCX / PPTX / images into Markdown, JSON, HTML, text, or doctags, and to chunk documents for downstream RAG pipelines.

All calls authenticate with your standard BABYAPI_API_KEY.

Health / version

client.docling.health()    # {"status": "ok"}
client.docling.ready()
client.docling.version()

Convert from a URL (synchronous)

res = client.docling.convert_source({
    "sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2408.09869"}],
    "options": {"to_formats": ["md"], "do_ocr": False, "page_range": [1, 10]},
})

print(res["document"]["md_content"])

Convert a local file (synchronous)

Files accept a path, bytes, file-like object, or structured dict:

# path
client.docling.convert_file(files="./invoice.pdf")

# multiple files + options
client.docling.convert_file(
    files=["./a.pdf", "./b.docx"],
    options={"to_formats": ["md", "json"]},
)

# raw bytes
with open("./report.pdf", "rb") as fp:
    client.docling.convert_file(
        files={"filename": "report.pdf", "content": fp.read(), "content_type": "application/pdf"},
    )

Convert asynchronously (recommended for large docs)

submitted = client.docling.convert_file_async(files="./big.pdf")
task_id = submitted["task_id"]

# Easiest: poll + fetch in one call.
result = client.docling.wait_for_result(
    task_id,
    interval_s=2.0,
    timeout_s=600.0,
    on_poll=lambda s: print("status:", s.get("status")),
)

print(result["document"]["md_content"])

Low-level alternative:

import time

submitted = client.docling.convert_source_async({
    "sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2408.09869"}],
})
task_id = submitted["task_id"]

while True:
    status = client.docling.poll_status(task_id)
    if status["status"] in ("success", "failure", "error"):
        break
    time.sleep(2)

if status["status"] == "success":
    print(client.docling.get_result(task_id))

Chunking

Same file/source shapes as conversion — output is a list of chunks suitable for embeddings.

# Hybrid chunking from a URL
client.docling.chunk.hybrid_source({
    "sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2408.09869"}],
})

# Hierarchical chunking from a file
client.docling.chunk.hierarchical_file(files="./handbook.pdf")

Conversion options

Pass any docling-serve options via options. Common ones:

Option	Default	Description
`to_formats`	`["md"]`	`md`, `json`, `html`, `text`, `doctags`
`do_ocr`	`True`	Run OCR on images/scanned pages
`force_ocr`	`False`	Force OCR even on text-layer PDFs
`do_table_structure`	`True`	Detect and extract table structure
`table_mode`	`"accurate"`	`accurate` or `fast`
`page_range`	full	e.g. `[1, 10]`
`image_export_mode`	`"embedded"`	`embedded` or `referenced`
`do_formula_enrichment`	`False`
`do_picture_classification`	`False`

See the Docling endpoints reference for the full list.

Errors

SDK errors raise BabyAPIError when possible.

import os
from babyapi import BabyAPI, BabyAPIError

client = BabyAPI(api_key=os.getenv("BABYAPI_API_KEY"))

try:
    client.chat.completions.create(model="mistral", messages=[])
except BabyAPIError as err:
    print(
        {
            "message": err.message,
            "status": err.status,
            "code": err.code,
            "type": err.type,
            "request_id": err.request_id,
        }
    )

Context manager / cleanup

The client maintains an httpx.Client. Use it as a context manager to ensure clean shutdown:

import os
from babyapi import BabyAPI

with BabyAPI(api_key=os.getenv("BABYAPI_API_KEY")) as client:
    res = client.completions.create(model="mistral", prompt="Ping")
    print(res["choices"][0]["text"])

License

MIT.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.3

May 30, 2026

0.3.2

May 28, 2026

This version

0.3.1

May 28, 2026

0.3.0

May 28, 2026

0.2.0

Apr 19, 2026

0.1.0

Dec 17, 2025

0.0.3

Dec 8, 2025

0.0.2

Dec 8, 2025

0.0.1

Dec 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

babyapi-0.3.1.tar.gz (14.1 kB view details)

Uploaded May 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

babyapi-0.3.1-py3-none-any.whl (15.6 kB view details)

Uploaded May 28, 2026 Python 3

File details

Details for the file babyapi-0.3.1.tar.gz.

File metadata

Download URL: babyapi-0.3.1.tar.gz
Upload date: May 28, 2026
Size: 14.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.8

File hashes

Hashes for babyapi-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`1e3f883a7e8ff1573897de76c20bde4c1b8359eeb6d50cfff2ca6900c7f45e84`
MD5	`20351d5b253a5a5ddc25ac2b888fe394`
BLAKE2b-256	`d8edab57ef2555c2dd31b73368d94032dc6062c7e0fdcbc5e0b5fb6fa688fc6e`

See more details on using hashes here.

File details

Details for the file babyapi-0.3.1-py3-none-any.whl.

File metadata

Download URL: babyapi-0.3.1-py3-none-any.whl
Upload date: May 28, 2026
Size: 15.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.8

File hashes

Hashes for babyapi-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d2207496666e4c9aeb95f6c39d0f655812e7b5bc5cf28dea1513827a132b636c`
MD5	`60c9f324bc683bf2bf79b062753c34ce`
BLAKE2b-256	`4ecade219f60e85d027ee86edcc92c78f103d10e74fafeec8b05425b81fecfeb`

See more details on using hashes here.

babyapi 0.3.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

BabyAPI (Python SDK)

Install

Quick start (the easy path): client.baby.infer(...)

Supported options (aliases accepted)

One method for both OpenAI endpoints: client.infer(...)

OpenAI-compatible: Chat Completions

OpenAI-compatible: Completions

OpenAI-compatible: Embeddings

Supported parameters

Reranking

Supported parameters

Streaming (SSE)

Streaming: chat

Streaming: completions

Multimodal (vision) examples (OpenAI-style)

Vision: non-streaming

Vision: streaming

Configuration

Per-call overrides (RequestOptions)

Timeouts & cancellation

Docling: document conversion & chunking

Health / version

Convert from a URL (synchronous)

Convert a local file (synchronous)

Convert asynchronously (recommended for large docs)

Chunking

Conversion options

Errors

Context manager / cleanup

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Quick start (the easy path): `client.baby.infer(...)`

One method for both OpenAI endpoints: `client.infer(...)`