Official Python SDK for Octen API - Web Search, URL Extract, Text/VL Embeddings, and LLM Chat

These details have not been verified by PyPI

Project description

Octen Python SDK

Official Python SDK for the Octen API — web search, URL extraction, text embeddings, and multi-model LLM chat in one package.

✨ Features

🔍 Web Search — search and retrieve ranked web results with filtering, highlighting, and full content
🌐 URL Extract — fetch and parse 1-20 URLs in a single batch with markdown / text output, query-driven highlights, and media (images / videos / audio / favicon)
💬 Multi-model Chat — access 10+ LLMs (GPT, Claude, Gemini, Kimi, MiniMax) through a single unified API
🧮 Text Embeddings — convert text into high-quality vector representations
🖼️ VL (Multimodal) Embeddings — encode text, images, and videos into a single fused vector or independent per-element vectors
⚡ Streaming (SSE) — real-time token streaming with typed event objects
🔄 Auto Retry — exponential backoff for transient errors
🛡️ Type Safe — full Pydantic models with IDE auto-completion
🔀 Async Support — native asyncio client for concurrent workloads
📦 HTTP/2 — connection pooling and keep-alive out of the box

📦 Installation

pip install octen

Requires Python 3.8 or higher.

Development Version

pip install octen[dev]

Async Support

pip install octen[async]

🚀 Quick Start

Search

from octen import Octen

with Octen(api_key="your-api-key") as client:
    response = client.search.search(query="Python programming", count=5)

    for result in response.results:
        print(f"Title: {result['title']}")
        print(f"URL: {result['url']}")
        print(f"Highlight: {result.get('highlight', '')}")

Extract

from octen import Octen

with Octen(api_key="your-api-key") as client:
    response = client.extract.extract(
        urls=["https://example.com", "https://octen.ai"],
        format="markdown",
    )

    for item in response.items:
        if item.status == "success":
            print(f"{item.title} — {item.url}")
        else:
            print(f"FAILED {item.url}: {item.error_message}")

Chat

from octen import Octen, ChatMessage

with Octen(api_key="your-api-key") as client:
    response = client.chat.create(
        model="openai/gpt-5.4",
        messages=[ChatMessage(role="user", content="Hello!")],
        web_search="on"
    )
    print(response.text)

Embeddings

from octen import Octen

with Octen(api_key="your-api-key") as client:
    embedding = client.embedding.create(
        input=["Hello, world!"],
        model="octen-embedding-4b"
    )
    vector = embedding.get_first_embedding()
    print(f"Vector dimension: {len(vector)}")

VL (Multimodal) Embeddings

from octen import Octen

with Octen(api_key="your-api-key") as client:
    # Fuse text + image into one vector
    response = client.vl_embedding.create(
        model="octen-vl-embedding-large",
        contents=[
            {"text": "A cute orange cat on a wooden chair"},
            {"image": "https://example.com/cat.jpg"},
        ],
        enable_fusion=True,
    )
    vector = response.get_first_embedding()
    if vector is not None:
        print(f"Fused vector dimension: {len(vector)}")

🔍 Search API

Advanced Search

from octen import Octen, HighlightOptions, FullContentOptions

with Octen(api_key="your-api-key") as client:
    response = client.search.search(
        query="machine learning best practices",
        count=10,
        search_type="semantic",  # Semantic search
        include_domains=["github.com", "arxiv.org"],  # Search only these domains
        start_time="2024-01-01T00:00:00Z",  # Time filtering
        highlight=HighlightOptions(
            enable=True,
            max_tokens=500
        ),
        full_content=FullContentOptions(
            enable=True,
            max_tokens=2000
        ),
        timeout=60.0  # Custom timeout
    )

    print(f"Found {len(response.results)} results")
    print(f"Actual search type: {response.search_type}")
    print(f"Token usage: {response.usage}")

🌐 Extract API

Fetch and parse one or more URLs into structured content. A single request accepts 1-20 URLs and is served as one upstream batch.

Batch Extract

from octen import Octen

with Octen(api_key="your-api-key") as client:
    response = client.extract.extract(
        urls=[
            "https://example.com",
            "https://octen.ai",
            "https://www.iana.org/about",
        ],
        format="markdown",          # "text" or "markdown" (default markdown)
        max_age_seconds=600,        # accept results cached within 10 minutes
        timeout=30,                 # per-URL upstream fetch budget (1-60s)
        include_favicon=True,
        include_images=True,
        request_timeout=90.0,       # local httpx deadline; >= timeout + overhead
    )

    print(f"OK: {response.successful_urls}/{response.total_urls}  ({response.latency}ms)")

    for item in response.items:
        if item.status == "success":
            print(f"  ✓ {item.title} — {item.url}")
            if item.favicon:
                print(f"    favicon: {item.favicon}")
        else:
            # Partial-success is first-class: failed URLs don't poison siblings.
            print(f"  ✗ {item.url}: {item.error_message}")

Note: response.items is parsed lazily and raises OctenAPIError if any row fails to parse (signals server schema drift). The raw dicts remain accessible via response.results as an escape hatch.

Query-driven Highlights

from octen import Octen

with Octen(api_key="your-api-key") as client:
    response = client.extract.extract(
        urls=["https://en.wikipedia.org/wiki/Python_(programming_language)"],
        query="async programming",   # max 500 chars
    )

    for item in response.items:
        for snippet in item.highlights or []:
            print(f"• {snippet}")

Single URL (Convenience Shortcut)

from octen import Octen

with Octen(api_key="your-api-key") as client:
    response = client.extract.simple_extract("https://example.com")
    print(response.items[0].title)

Two Timeouts, Two Layers

timeout and request_timeout operate at different layers — easy to confuse, important to get right:

Parameter	Layer	Controls	On timeout
`timeout` (int, 1-60s)	server	upstream per-URL fetch budget	the slow URL is reported with `status="failed"` and an `error_message`; sibling URLs in the same batch are returned normally as long as the upstream responds within the bounded round trip
`request_timeout` (float)	client (httpx)	local socket deadline for the whole HTTP call	raises `OctenTimeoutError`

Rule of thumb: request_timeout >= timeout + network_overhead.

💬 Chat API

Non-streaming

from octen import Octen, ChatMessage, WebSearchOptions

with Octen(api_key="your-api-key") as client:
    response = client.chat.create(
        model="openai/gpt-5.4",
        messages=[
            ChatMessage(role="system", content="You are a helpful assistant."),
            ChatMessage(role="user", content="What happened in tech today?"),
        ],
        web_search="on",
        web_search_options=WebSearchOptions(safesearch="off", count=5),
        max_tokens=500,
        temperature=0.7
    )

    print(response.text)
    print(f"Tokens used: {response.usage.total_tokens}")

    # Access search results
    if response.search_results:
        for group in response.search_results:
            for item in group.results:
                print(f"  - {item.title}: {item.url}")

Streaming

from octen import Octen, ChatMessage

with Octen(api_key="your-api-key") as client:
    for event in client.chat.create(
        model="openai/gpt-5.4",
        messages=[ChatMessage(role="user", content="Tell me a story")],
        stream=True,
        web_search="on"
    ):
        if event.type == "search_done":
            print(f"[{len(event.search_results or [])} search groups]")

        elif event.type == "content" and event.choices:
            print(event.choices[0].delta.content or "", end="", flush=True)

        elif event.type == "finish":
            print()  # newline

        elif event.type == "usage" and event.usage:
            print(f"[total tokens: {event.usage.total_tokens}]")

Tool Calling

from octen import Octen
from octen.models import ChatMessage, Tool, ToolFunction

weather_tool = Tool(
    function=ToolFunction(
        name="get_weather",
        description="Get current weather for a city",
        parameters={
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"},
            },
            "required": ["city"],
        }
    )
)

with Octen(api_key="your-api-key") as client:
    response = client.chat.create(
        model="openai/gpt-5.4",
        messages=[ChatMessage(role="user", content="What's the weather in London?")],
        tools=[weather_tool],
        tool_choice="auto"
    )
    if response.choices[0].finish_reason == "tool_calls":
        tc = response.choices[0].message.tool_calls[0]
        print(f"Tool: {tc.function.name}, Args: {tc.function.arguments}")

JSON Output Mode

from octen import Octen, ChatMessage
from octen.models import ResponseFormat

with Octen(api_key="your-api-key") as client:
    response = client.chat.create(
        model="google/gemini-3-flash-preview",
        messages=[ChatMessage(role="user", content="Return a JSON list of 3 programming languages")],
        response_format=ResponseFormat(type="json_object"),
        web_search="off"
    )
    print(response.text)

Web Search with Full Page Content

from octen import Octen, ChatMessage, WebSearchOptions
from octen.models.chat import ChatFullContentOptions

with Octen(api_key="your-api-key") as client:
    response = client.chat.create(
        model="openai/gpt-5.4",
        messages=[ChatMessage(role="user", content="Latest Python 3.13 features?")],
        web_search="on",
        web_search_options=WebSearchOptions(
            safesearch="off",
            full_content=ChatFullContentOptions(enable=True, max_tokens=1000)
        )
    )
    print(f"Full content tokens: {response.usage.full_content_tokens}")

🤖 Supported Chat Models

For the full and up-to-date list of supported models, visit the Octen official website.

🧮 Embeddings API

Batch Embeddings

from octen import Octen

with Octen(api_key="your-api-key") as client:
    # Process multiple texts
    texts = [
        "Artificial intelligence is transforming the world",
        "Applications of deep learning",
        "Natural language processing technology"
    ]

    response = client.embedding.create(
        input=texts,
        model="octen-embedding-8b",
        input_type="document"
    )

    vectors = response.get_embeddings()
    print(f"Generated {len(vectors)} vectors")

    # Or use convenience methods
    query_vector = client.embedding.embed_query("search query")
    doc_vectors = client.embedding.embed_documents(["document 1", "document 2"])

Multimodal (VL) Embeddings

The vl_embedding resource encodes text, images, and videos through the /vl-embedding endpoint. Two output modes:

Fusion (enable_fusion=True) — all contents collapse into a single vector (type="fusion")
Independent (enable_fusion omitted or False) — one vector per content element (type="vl"). Omitting the field inherits the server default (False).

Default-value semantics: enable_fusion / fps / instruct default to None in the SDK and are stripped from the wire via exclude_none=True. When omitted, the server applies its own defaults (shown in the table). enable_fusion=False and omitting the field produce different wire payloads but currently the same behavior — pass False explicitly only if you need to override a future server default change.

Parameter	Type	Required	Description
`model`	string	✓	`octen-vl-embedding` (max dim 2048) or `octen-vl-embedding-large` (max dim 4096)
`contents`	list[dict \| VLEmbeddingContent]	✓	Each item has exactly one of `text` / `image` / `video` (non-empty). Max 20 total, 5 images, 1 video
`enable_fusion`	bool	—	Single fused vector vs. independent vectors. Omit → server default `False`
`dimension`	int	—	Output dim, ≤ model's max. Omit → model's max
`fps`	float (0-1)	—	Video frame sampling density. Omit → server default `1`; lower values reduce video token cost
`instruct`	string	—	Custom task description. Omit → server default `"Represent the user's input."`

Content element constraints:

text — string, ≤ 32,000 tokens per entry
image — URL or Base64 string, ≤ 5MB, formats: JPEG/PNG/WEBP/BMP/TIFF/ICO/DIB/ICNS/SGI
video — URL only, ≤ 50MB, formats: MP4/AVI/MOV

from octen import Octen, VLEmbeddingContent

with Octen(api_key="your-api-key") as client:
    # 1) Text-only embedding
    response = client.vl_embedding.create(
        model="octen-vl-embedding",
        contents=[{"text": "What is multimodal vector search?"}],
    )
    vector = response.get_first_embedding()
    if vector is not None:
        print(f"text-only dim={len(vector)}")  # 2048

    # 2) Fused multimodal embedding (text + images + video)
    response = client.vl_embedding.create(
        model="octen-vl-embedding-large",
        contents=[
            {"text": "Outdoor tent, 3-4 person, waterproof"},
            {"image": "https://example.com/tent_setup.jpg"},
            {"image": "https://example.com/tent_inside.jpg"},
            {"video": "https://example.com/tent_demo.mp4"},
        ],
        enable_fusion=True,
        dimension=2048,
        fps=0.3,
        instruct="Represent the outdoor product for retrieval",
    )
    print(f"fusion type={response.items[0].type} dim={response.items[0].dimension}")

    # 3) Independent per-image embeddings
    response = client.vl_embedding.create(
        model="octen-vl-embedding",
        contents=[
            {"image": "https://example.com/product_1.jpg"},
            {"image": "https://example.com/product_2.jpg"},
        ],
        enable_fusion=False,
    )
    for item in response.items:
        print(f"index={item.index} type={item.type} dim={item.dimension}")

    # 4) Typed content objects (alternative to dicts)
    response = client.vl_embedding.create(
        model="octen-vl-embedding",
        contents=[
            VLEmbeddingContent(text="A photo of a cat"),
            VLEmbeddingContent(image="https://example.com/cat.jpg"),
        ],
        enable_fusion=True,
    )

    # Usage breakdown
    if response.usage:
        print(
            f"input_tokens={response.usage.input_tokens} "
            f"text_tokens={response.usage.text_tokens} "
            f"image_tokens={response.usage.image_tokens} "
            f"image_count={response.usage.image_count} "
            f"duration={response.usage.duration}s"
        )

Async usage:

import asyncio
from octen import AsyncOcten

async def main():
    async with AsyncOcten(api_key="your-api-key") as client:
        response = await client.vl_embedding.create(
            model="octen-vl-embedding-large",
            contents=[
                {"text": "modern art exhibition"},
                {"image": "https://example.com/painting.jpg"},
            ],
            enable_fusion=True,
        )
        print(response.get_first_embedding()[:3])

asyncio.run(main())

📖 Full API reference: https://docs.octen.ai/api-reference/vl-embedding

Custom Configuration

from octen import Octen

client = Octen(
    api_key="your-api-key",
    base_url="https://api.octen.ai",  # Custom API endpoint
    timeout=10.0,  # Global default timeout (seconds)
    max_retries=3,  # Maximum retry attempts
    http2=True  # Enable HTTP/2
)

try:
    # This request uses global timeout (10 seconds)
    response1 = client.search.search("query 1")

    # This request overrides timeout to 30 seconds
    response2 = client.search.search("complex query", timeout=30.0)
finally:
    client.close()  # Release connection pool resources

📚 API Documentation

Search API

`client.search.search()`

Perform a web search query.

Parameters:

query (str, required): Search query string, max 500 characters
count (int, optional): Number of results to return, range 1-100, default 5
search_type (str, optional): Search type, options:
- "auto" - Automatically select (default)
- "keyword" - Keyword search
- "semantic" - Semantic search
include_domains (List[str], optional): Include only results from these domains
exclude_domains (List[str], optional): Exclude results from these domains
include_text (List[str], optional): Results must contain these texts
exclude_text (List[str], optional): Results must exclude these texts
time_basis (str, optional): Time basis, options: "auto", "published", "crawled"
start_time (str, optional): Start time in ISO 8601 format
end_time (str, optional): End time in ISO 8601 format
highlight (HighlightOptions, optional): Highlight options configuration
format (str, optional): Content format, options: "text", "markdown"
safesearch (str, optional): Safe search, options: "off", "strict" (default)
full_content (FullContentOptions, optional): Full content options configuration
timeout (float, optional): Request timeout in seconds

Returns: SearchResponse object

Response Properties:

results - List of search results
query - The actual query used
search_type - The actual search type used
usage - Token usage information
latency - Latency information

Extract API

`client.extract.extract()`

Fetch and parse one or more URLs in a single batch.

Parameters:

urls (List[str], required): URLs to extract — 1-20 per request, each ≤ 2048 characters
format (str, optional): Content format — "text" or "markdown" (server default "markdown")
max_age_seconds (int, optional): Cache window in seconds. Server clamps into [300, 31_536_000] (5 min – 1 year) and defaults to 86_400 (24h) when omitted
query (str, optional): Query for per-result highlight extraction, max 500 characters. When set, each item's highlights field is populated
timeout (int, optional): Per-URL upstream fetch budget in seconds, range 1-60, default 30. A URL that exceeds this returns status="failed" — siblings continue
include_images (bool, optional): Return image objects (default False)
include_favicon (bool, optional): Return favicon URL (default False)
include_videos (bool, optional): Return video objects (default False)
include_audio (bool, optional): Return audio objects (default False)
request_timeout (float, optional): Local HTTP socket deadline (httpx), distinct from the upstream timeout above. Set >= timeout + network_overhead

Caller typos (e.g. url= instead of urls=, inculde_images=) are rejected at construction time by Pydantic — they will not silently reach the server.

Returns: ExtractResponse object

Response Properties:

code (int) — 0 on success
msg (str) — Server message
request_id (str | None) — Server-generated request id; echoed from X-Request-Id if you supplied that header
results (List[dict]) — Raw per-URL result dicts
items (List[ExtractItem]) — Parsed, typed per-URL results
usage (dict | None) — {"total_urls": int, "successful_urls": int}
total_urls (int | None) — Convenience accessor
successful_urls (int | None) — Number of URLs with status="success" in this batch
latency (int | None) — End-to-end server latency in ms
warning (str | None) — Non-fatal warning

ExtractItem Fields:

url (str) — The requested URL
status (Literal["success", "failed"]) — Extraction outcome
title (str | None) — Page title
full_content (str | None) — Extracted page content
highlights (List[str] | None) — Snippets when query was set
time_published / time_last_crawled (str | None) — ISO 8601 timestamps
error_message (str | None) — Populated only when status="failed"
favicon (str | None) — When include_favicon=True
images / videos / audio (List[Any] | None) — Passthrough media objects; schema is upstream-defined
category (ExtractCategory | None) — primary / secondary classification labels (when classifier enabled server-side)
page_structure (ExtractPageStructure | None) — primary / secondary structure labels

`client.extract.simple_extract(url)`

Convenience shortcut for a single URL with default parameters.

response = client.extract.simple_extract("https://example.com")

Chat API

`client.chat.create()`

Create a chat completion (non-streaming or streaming).

Parameters:

messages (List[ChatMessage | dict], required): Conversation history. Each item can be a ChatMessage object or a plain dict {"role": ..., "content": ...}
model (str, required): Model ID (e.g. "openai/gpt-5.4"). See Supported Chat Models for the full list
stream (bool, optional): If True, return a Stream iterator of StreamEvent objects. Default False
web_search (str, optional): "on" to augment with live web search, "off" to disable
web_search_options (WebSearchOptions, optional): Fine-grained search configuration
- safesearch (str): "off" or "strict" (default "off")
- count (int): Number of search results, range 1-100
- country (str): Country code for localised results (e.g. "CN")
- include_domains / exclude_domains (List[str]): Domain filtering
- include_text / exclude_text (List[str]): Text filtering
- time_basis (str): "auto", "published", or "crawled"
- start_time / end_time (str): ISO 8601 time range
- format (str): "text" or "markdown"
- full_content (ChatFullContentOptions): Full page content options
- highlight (ChatHighlightOptions): Highlight snippet options
max_tokens (int, optional): Maximum number of output tokens
max_completion_tokens (int, optional): Alternative max-token parameter
temperature (float, optional): Sampling temperature [0, 2]
top_p (float, optional): Nucleus sampling probability (0, 1]
frequency_penalty (float, optional): Frequency penalty [-2, 2]
presence_penalty (float, optional): Presence penalty [-2, 2]
response_format (ResponseFormat, optional): Output format — ResponseFormat(type="text"), ResponseFormat(type="json_object"), or ResponseFormat(type="json_schema", json_schema=...)
stop (List[str], optional): Up to 4 stop sequences
seed (int, optional): Integer seed for deterministic sampling
reasoning_effort (str, optional): Chain-of-thought effort: "low", "medium", or "high"
logprobs (bool, optional): Whether to return log probabilities
top_logprobs (int, optional): Number of most-likely tokens [0, 20]. Requires logprobs=True
logit_bias (Dict[str, float], optional): Token ID to bias value mapping
tools (List[Tool | dict], optional): Tool/function definitions available to the model
tool_choice (str | dict, optional): "none", "auto", "required", or a dict specifying a particular tool
user (str, optional): Opaque end-user identifier
timeout (float, optional): Per-request timeout in seconds (default 60s for chat)

Returns:

ChatCompletion when stream=False
Stream (iterable of StreamEvent) when stream=True

ChatCompletion Properties:

id - Unique completion ID
model - Model used for generation
choices - List of Choice objects
text - Convenience accessor for the first choice's content
usage - Usage object (prompt_tokens, completion_tokens, total_tokens, num_search_queries, reasoning_tokens)
search_results - List of ChatSearchResult (when web_search="on")
citations - Citation string referencing search results
warning - Optional warning message

StreamEvent Properties:

type - Event type: "search_done", "content", "finish", "usage", "error"
choices - List of StreamChoice (with delta.content for incremental text)
search_results - Web search results (on search_done event)
usage - Token usage (on usage event)
citations - Citation string (on search_done event)
error - StreamError with message and code (on error event)

Embedding API

`client.embedding.create()`

Create text embedding vectors.

Parameters:

input (str | List[str], required): Input text or list of texts
model (str, optional): Model name, options:
- "octen-embedding-0.6b" - Lightweight model
- "octen-embedding-4b" - Balanced performance
- "octen-embedding-8b" - Highest quality
dimension (int, optional): Vector dimension
input_type (str, optional): Input type, options: "query" or "document"
truncation (bool, optional): Whether to truncate long inputs, default True
timeout (float, optional): Request timeout in seconds

Returns: EmbeddingResponse object

Response Methods:

get_embeddings() - Get all vectors
get_first_embedding() - Get first vector (for single input)

Convenience Methods:

embed_query(text) - Embed a single query text
embed_documents(texts) - Batch embed document texts

VL Embedding API

`client.vl_embedding.create()`

Create multimodal embedding vectors from text, images, and videos.

Parameters:

model (str, required): Model name, options:
- "octen-vl-embedding" - Max dimension 2048
- "octen-vl-embedding-large" - Max dimension 4096
contents (List[dict | VLEmbeddingContent], required): Content elements. Each item must set exactly one of:
- text (str) - Text input, ≤ 32k tokens
- image (str) - Image URL or Base64, ≤ 5MB (JPEG/PNG/WEBP/BMP/TIFF/ICO/DIB/ICNS/SGI)
- video (str) - Video URL, ≤ 50MB (MP4/AVI/MOV)
Limits: ≤ 20 total elements, ≤ 5 images, ≤ 1 video per request.
enable_fusion (bool, optional): Fuse all contents into a single vector. Omit → server default False (independent vectors). The SDK uses exclude_none=True, so omitting and passing False produce different wire payloads
dimension (int, optional): Output dimension, ≤ model max. Omit → model max
fps (float, optional): Video frame sampling density, 0 ≤ x ≤ 1. Omit → server default 1
instruct (str, optional): Custom task description for the encoder. Omit → server default "Represent the user's input."
timeout (float, optional): Request timeout in seconds

Returns: VLEmbeddingResponse object

Response Properties:

code (int) - Response code; 0 means success. Non-zero responses raise OctenAPIError at parse time
msg (str) - Server-provided result message
request_id - Server-assigned request id (use this to grep server logs)
items - List of VLEmbeddingResult with .index, .embedding, .type ("vl" or "fusion"), .dimension. Raises OctenAPIError on schema drift
results - Raw result dicts (escape hatch when items raises)
model - Server-reported model used
usage - VLEmbeddingUsage with input_tokens / text_tokens / image_tokens (covers sampled video frames too) / image_count (excludes video frames) / duration (video seconds)
warning - Optional server warning (string only; non-string payloads return None)

Response Methods:

get_embeddings() - Get all vectors
get_first_embedding() - Get first vector (useful in fusion mode)

🔧 Async Support

import asyncio
from octen import AsyncOcten, ChatMessage

async def main():
    async with AsyncOcten(api_key="your-api-key") as client:
        # Concurrent chat requests
        task1 = client.chat.create(
            model="openai/gpt-5.4",
            messages=[ChatMessage(role="user", content="Explain deep learning")],
            web_search="off"
        )
        task2 = client.chat.create(
            model="anthropic/claude-sonnet-4.6",
            messages=[ChatMessage(role="user", content="Explain reinforcement learning")],
            web_search="off"
        )
        r1, r2 = await asyncio.gather(task1, task2)
        print(r1.text)
        print(r2.text)

        # Async streaming
        stream = await client.chat.create(
            model="openai/gpt-5.4",
            messages=[ChatMessage(role="user", content="Hello!")],
            stream=True
        )
        async for event in stream:
            if event.type == "content" and event.choices:
                print(event.choices[0].delta.content or "", end="", flush=True)

        # Search, extract, and embeddings also work async
        results = await client.search.search(query="AI")
        extracted = await client.extract.extract(urls=["https://example.com"])
        embedding = await client.embedding.create(input=["Hello"], model="octen-embedding-4b")

asyncio.run(main())

⚠️ Error Handling

from octen import (
    Octen,
    ChatMessage,
    OctenAPIError,
    OctenTimeoutError,
    OctenConnectionError,
    OctenRateLimitError,
    OctenAuthenticationError,
    OctenStreamError,
)

with Octen(api_key="your-api-key") as client:
    try:
        response = client.chat.create(
            model="openai/gpt-5.4",
            messages=[ChatMessage(role="user", content="Hello")]
        )
    except OctenAuthenticationError:
        print("Invalid or missing API key")
    except OctenRateLimitError as e:
        print(f"Rate limited — retry after {e.retry_after}s")
    except OctenStreamError as e:
        print(f"Stream error: {e.message} (code {e.code})")
    except OctenTimeoutError as e:
        print(f"Request timed out after {e.timeout}s")
    except OctenAPIError as e:
        print(f"API error {e.status_code}: {e.message}")

🧪 Development

Install Development Dependencies

# Install development version from source
pip install -e ".[dev]"

Run Tests

pytest tests/

Code Formatting

black octen/
ruff check octen/ --fix

Type Checking

mypy octen/

📝 License

MIT License - See LICENSE file for details

🔗 Links

📧 Support

For questions or help, please:

Check the Documentation
Email us at support@octen.ai

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.4.0

May 22, 2026

0.3.0

May 18, 2026

0.2.1

Mar 30, 2026

0.2.0

Mar 25, 2026

0.1.4

Mar 12, 2026

0.1.3

Mar 10, 2026

0.1.2

Mar 6, 2026

0.1.1

Mar 6, 2026

0.1.1b5 pre-release

Mar 5, 2026

0.1.1b4 pre-release

Mar 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

octen-0.4.0.tar.gz (82.5 kB view details)

Uploaded May 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

octen-0.4.0-py3-none-any.whl (57.9 kB view details)

Uploaded May 22, 2026 Python 3

File details

Details for the file octen-0.4.0.tar.gz.

File metadata

Download URL: octen-0.4.0.tar.gz
Upload date: May 22, 2026
Size: 82.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for octen-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`2323909615fb1648c2ad61e815d47884e3f87d05ac3659dcaa823e17f4e4c816`
MD5	`46c5420398d314839c1df805c039df20`
BLAKE2b-256	`21d23a25723dd30fd516b03f0abfe733e302396907a3e9d678924c0d21cdbf72`

See more details on using hashes here.

File details

Details for the file octen-0.4.0-py3-none-any.whl.

File metadata

Download URL: octen-0.4.0-py3-none-any.whl
Upload date: May 22, 2026
Size: 57.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for octen-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ae48d3acbb8da900eb6e187b00a28e34d2c8bf44359e649a1b71dbdc8a11999c`
MD5	`366d991b573a64b177c676d3112d6cf9`
BLAKE2b-256	`e02d18acc2d8778f6a4d09fc89ef8c2b3525dd2df3152621d058da201b07de0d`

See more details on using hashes here.

octen 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Octen Python SDK

✨ Features

📦 Installation

Development Version

Async Support

🚀 Quick Start

Search

Extract

Chat

Embeddings

VL (Multimodal) Embeddings

🔍 Search API

Advanced Search

🌐 Extract API

Batch Extract

Query-driven Highlights

Single URL (Convenience Shortcut)

Two Timeouts, Two Layers

💬 Chat API

Non-streaming

Streaming

Tool Calling

JSON Output Mode

Web Search with Full Page Content

🤖 Supported Chat Models

🧮 Embeddings API

Batch Embeddings

Multimodal (VL) Embeddings

Custom Configuration

📚 API Documentation

Search API

client.search.search()

Extract API

client.extract.extract()

client.extract.simple_extract(url)

Chat API

client.chat.create()

Embedding API

client.embedding.create()

VL Embedding API

client.vl_embedding.create()

🔧 Async Support

⚠️ Error Handling

🧪 Development

Install Development Dependencies

Run Tests

Code Formatting

Type Checking

📝 License

🔗 Links

📧 Support

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`client.search.search()`

`client.extract.extract()`

`client.extract.simple_extract(url)`

`client.chat.create()`

`client.embedding.create()`

`client.vl_embedding.create()`