Skip to main content

Official Python SDK for Octen API - Web Search, URL Extract, Text Embeddings, and LLM Chat

Project description

Octen Python SDK

PyPI version Python Support License: MIT

Official Python SDK for the Octen API โ€” web search, URL extraction, text embeddings, and multi-model LLM chat in one package.

โœจ Features

  • ๐Ÿ” Web Search โ€” search and retrieve ranked web results with filtering, highlighting, and full content
  • ๐ŸŒ URL Extract โ€” fetch and parse 1-20 URLs in a single batch with markdown / text output, query-driven highlights, and media (images / videos / audio / favicon)
  • ๐Ÿ’ฌ Multi-model Chat โ€” access 10+ LLMs (GPT, Claude, Gemini, Kimi, MiniMax) through a single unified API
  • ๐Ÿงฎ Text Embeddings โ€” convert text into high-quality vector representations
  • โšก Streaming (SSE) โ€” real-time token streaming with typed event objects
  • ๐Ÿ”„ Auto Retry โ€” exponential backoff for transient errors
  • ๐Ÿ›ก๏ธ Type Safe โ€” full Pydantic models with IDE auto-completion
  • ๐Ÿ”€ Async Support โ€” native asyncio client for concurrent workloads
  • ๐Ÿ“ฆ HTTP/2 โ€” connection pooling and keep-alive out of the box

๐Ÿ“ฆ Installation

pip install octen

Requires Python 3.8 or higher.

Development Version

pip install octen[dev]

Async Support

pip install octen[async]

๐Ÿš€ Quick Start

Search

from octen import Octen

with Octen(api_key="your-api-key") as client:
    response = client.search.search(query="Python programming", count=5)

    for result in response.results:
        print(f"Title: {result['title']}")
        print(f"URL: {result['url']}")
        print(f"Highlight: {result.get('highlight', '')}")

Extract

from octen import Octen

with Octen(api_key="your-api-key") as client:
    response = client.extract.extract(
        urls=["https://example.com", "https://octen.ai"],
        format="markdown",
    )

    for item in response.items:
        if item.status == "success":
            print(f"{item.title} โ€” {item.url}")
        else:
            print(f"FAILED {item.url}: {item.error_message}")

Chat

from octen import Octen, ChatMessage

with Octen(api_key="your-api-key") as client:
    response = client.chat.create(
        model="openai/gpt-5.4",
        messages=[ChatMessage(role="user", content="Hello!")],
        web_search="on"
    )
    print(response.text)

Embeddings

from octen import Octen

with Octen(api_key="your-api-key") as client:
    embedding = client.embedding.create(
        input=["Hello, world!"],
        model="octen-embedding-4b"
    )
    vector = embedding.get_first_embedding()
    print(f"Vector dimension: {len(vector)}")

๐Ÿ” Search API

Advanced Search

from octen import Octen, HighlightOptions, FullContentOptions

with Octen(api_key="your-api-key") as client:
    response = client.search.search(
        query="machine learning best practices",
        count=10,
        search_type="semantic",  # Semantic search
        include_domains=["github.com", "arxiv.org"],  # Search only these domains
        start_time="2024-01-01T00:00:00Z",  # Time filtering
        highlight=HighlightOptions(
            enable=True,
            max_tokens=500
        ),
        full_content=FullContentOptions(
            enable=True,
            max_tokens=2000
        ),
        timeout=60.0  # Custom timeout
    )

    print(f"Found {len(response.results)} results")
    print(f"Actual search type: {response.search_type}")
    print(f"Token usage: {response.usage}")

๐ŸŒ Extract API

Fetch and parse one or more URLs into structured content. A single request accepts 1-20 URLs and is served as one upstream batch.

Batch Extract

from octen import Octen

with Octen(api_key="your-api-key") as client:
    response = client.extract.extract(
        urls=[
            "https://example.com",
            "https://octen.ai",
            "https://www.iana.org/about",
        ],
        format="markdown",          # "text" or "markdown" (default markdown)
        max_age_seconds=600,        # accept results cached within 10 minutes
        timeout=30,                 # per-URL upstream fetch budget (1-60s)
        include_favicon=True,
        include_images=True,
        request_timeout=90.0,       # local httpx deadline; >= timeout + overhead
    )

    print(f"OK: {response.successful_urls}/{response.total_urls}  ({response.latency}ms)")

    for item in response.items:
        if item.status == "success":
            print(f"  โœ“ {item.title} โ€” {item.url}")
            if item.favicon:
                print(f"    favicon: {item.favicon}")
        else:
            # Partial-success is first-class: failed URLs don't poison siblings.
            print(f"  โœ— {item.url}: {item.error_message}")

Note: response.items is parsed lazily and raises OctenAPIError if any row fails to parse (signals server schema drift). The raw dicts remain accessible via response.results as an escape hatch.

Query-driven Highlights

from octen import Octen

with Octen(api_key="your-api-key") as client:
    response = client.extract.extract(
        urls=["https://en.wikipedia.org/wiki/Python_(programming_language)"],
        query="async programming",   # max 500 chars
    )

    for item in response.items:
        for snippet in item.highlights or []:
            print(f"โ€ข {snippet}")

Single URL (Convenience Shortcut)

from octen import Octen

with Octen(api_key="your-api-key") as client:
    response = client.extract.simple_extract("https://example.com")
    print(response.items[0].title)

Two Timeouts, Two Layers

timeout and request_timeout operate at different layers โ€” easy to confuse, important to get right:

Parameter Layer Controls On timeout
timeout (int, 1-60s) server upstream per-URL fetch budget the slow URL is reported with status="failed" and an error_message; sibling URLs in the same batch are returned normally as long as the upstream responds within the bounded round trip
request_timeout (float) client (httpx) local socket deadline for the whole HTTP call raises OctenTimeoutError

Rule of thumb: request_timeout >= timeout + network_overhead.

๐Ÿ’ฌ Chat API

Non-streaming

from octen import Octen, ChatMessage, WebSearchOptions

with Octen(api_key="your-api-key") as client:
    response = client.chat.create(
        model="openai/gpt-5.4",
        messages=[
            ChatMessage(role="system", content="You are a helpful assistant."),
            ChatMessage(role="user", content="What happened in tech today?"),
        ],
        web_search="on",
        web_search_options=WebSearchOptions(safesearch="off", count=5),
        max_tokens=500,
        temperature=0.7
    )

    print(response.text)
    print(f"Tokens used: {response.usage.total_tokens}")

    # Access search results
    if response.search_results:
        for group in response.search_results:
            for item in group.results:
                print(f"  - {item.title}: {item.url}")

Streaming

from octen import Octen, ChatMessage

with Octen(api_key="your-api-key") as client:
    for event in client.chat.create(
        model="openai/gpt-5.4",
        messages=[ChatMessage(role="user", content="Tell me a story")],
        stream=True,
        web_search="on"
    ):
        if event.type == "search_done":
            print(f"[{len(event.search_results or [])} search groups]")

        elif event.type == "content" and event.choices:
            print(event.choices[0].delta.content or "", end="", flush=True)

        elif event.type == "finish":
            print()  # newline

        elif event.type == "usage" and event.usage:
            print(f"[total tokens: {event.usage.total_tokens}]")

Tool Calling

from octen import Octen
from octen.models import ChatMessage, Tool, ToolFunction

weather_tool = Tool(
    function=ToolFunction(
        name="get_weather",
        description="Get current weather for a city",
        parameters={
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"},
            },
            "required": ["city"],
        }
    )
)

with Octen(api_key="your-api-key") as client:
    response = client.chat.create(
        model="openai/gpt-5.4",
        messages=[ChatMessage(role="user", content="What's the weather in London?")],
        tools=[weather_tool],
        tool_choice="auto"
    )
    if response.choices[0].finish_reason == "tool_calls":
        tc = response.choices[0].message.tool_calls[0]
        print(f"Tool: {tc.function.name}, Args: {tc.function.arguments}")

JSON Output Mode

from octen import Octen, ChatMessage
from octen.models import ResponseFormat

with Octen(api_key="your-api-key") as client:
    response = client.chat.create(
        model="google/gemini-3-flash-preview",
        messages=[ChatMessage(role="user", content="Return a JSON list of 3 programming languages")],
        response_format=ResponseFormat(type="json_object"),
        web_search="off"
    )
    print(response.text)

Web Search with Full Page Content

from octen import Octen, ChatMessage, WebSearchOptions
from octen.models.chat import ChatFullContentOptions

with Octen(api_key="your-api-key") as client:
    response = client.chat.create(
        model="openai/gpt-5.4",
        messages=[ChatMessage(role="user", content="Latest Python 3.13 features?")],
        web_search="on",
        web_search_options=WebSearchOptions(
            safesearch="off",
            full_content=ChatFullContentOptions(enable=True, max_tokens=1000)
        )
    )
    print(f"Full content tokens: {response.usage.full_content_tokens}")

๐Ÿค– Supported Chat Models

For the full and up-to-date list of supported models, visit the Octen official website.

๐Ÿงฎ Embeddings API

Batch Embeddings

from octen import Octen

with Octen(api_key="your-api-key") as client:
    # Process multiple texts
    texts = [
        "Artificial intelligence is transforming the world",
        "Applications of deep learning",
        "Natural language processing technology"
    ]

    response = client.embedding.create(
        input=texts,
        model="octen-embedding-8b",
        input_type="document"
    )

    vectors = response.get_embeddings()
    print(f"Generated {len(vectors)} vectors")

    # Or use convenience methods
    query_vector = client.embedding.embed_query("search query")
    doc_vectors = client.embedding.embed_documents(["document 1", "document 2"])

Custom Configuration

from octen import Octen

client = Octen(
    api_key="your-api-key",
    base_url="https://api.octen.ai",  # Custom API endpoint
    timeout=10.0,  # Global default timeout (seconds)
    max_retries=3,  # Maximum retry attempts
    http2=True  # Enable HTTP/2
)

try:
    # This request uses global timeout (10 seconds)
    response1 = client.search.search("query 1")

    # This request overrides timeout to 30 seconds
    response2 = client.search.search("complex query", timeout=30.0)
finally:
    client.close()  # Release connection pool resources

๐Ÿ“š API Documentation

Search API

client.search.search()

Perform a web search query.

Parameters:

  • query (str, required): Search query string, max 500 characters
  • count (int, optional): Number of results to return, range 1-100, default 5
  • search_type (str, optional): Search type, options:
    • "auto" - Automatically select (default)
    • "keyword" - Keyword search
    • "semantic" - Semantic search
  • include_domains (List[str], optional): Include only results from these domains
  • exclude_domains (List[str], optional): Exclude results from these domains
  • include_text (List[str], optional): Results must contain these texts
  • exclude_text (List[str], optional): Results must exclude these texts
  • time_basis (str, optional): Time basis, options: "auto", "published", "crawled"
  • start_time (str, optional): Start time in ISO 8601 format
  • end_time (str, optional): End time in ISO 8601 format
  • highlight (HighlightOptions, optional): Highlight options configuration
  • format (str, optional): Content format, options: "text", "markdown"
  • safesearch (str, optional): Safe search, options: "off", "strict" (default)
  • full_content (FullContentOptions, optional): Full content options configuration
  • timeout (float, optional): Request timeout in seconds

Returns: SearchResponse object

Response Properties:

  • results - List of search results
  • query - The actual query used
  • search_type - The actual search type used
  • usage - Token usage information
  • latency - Latency information

Extract API

client.extract.extract()

Fetch and parse one or more URLs in a single batch.

Parameters:

  • urls (List[str], required): URLs to extract โ€” 1-20 per request, each โ‰ค 2048 characters
  • format (str, optional): Content format โ€” "text" or "markdown" (server default "markdown")
  • max_age_seconds (int, optional): Cache window in seconds. Server clamps into [300, 31_536_000] (5 min โ€“ 1 year) and defaults to 86_400 (24h) when omitted
  • query (str, optional): Query for per-result highlight extraction, max 500 characters. When set, each item's highlights field is populated
  • timeout (int, optional): Per-URL upstream fetch budget in seconds, range 1-60, default 30. A URL that exceeds this returns status="failed" โ€” siblings continue
  • include_images (bool, optional): Return image objects (default False)
  • include_favicon (bool, optional): Return favicon URL (default False)
  • include_videos (bool, optional): Return video objects (default False)
  • include_audio (bool, optional): Return audio objects (default False)
  • request_timeout (float, optional): Local HTTP socket deadline (httpx), distinct from the upstream timeout above. Set >= timeout + network_overhead

Caller typos (e.g. url= instead of urls=, inculde_images=) are rejected at construction time by Pydantic โ€” they will not silently reach the server.

Returns: ExtractResponse object

Response Properties:

  • code (int) โ€” 0 on success
  • msg (str) โ€” Server message
  • request_id (str | None) โ€” Server-generated request id; echoed from X-Request-Id if you supplied that header
  • results (List[dict]) โ€” Raw per-URL result dicts
  • items (List[ExtractItem]) โ€” Parsed, typed per-URL results
  • usage (dict | None) โ€” {"total_urls": int, "successful_urls": int}
  • total_urls (int | None) โ€” Convenience accessor
  • successful_urls (int | None) โ€” Number of URLs with status="success" in this batch
  • latency (int | None) โ€” End-to-end server latency in ms
  • warning (str | None) โ€” Non-fatal warning

ExtractItem Fields:

  • url (str) โ€” The requested URL
  • status (Literal["success", "failed"]) โ€” Extraction outcome
  • title (str | None) โ€” Page title
  • full_content (str | None) โ€” Extracted page content
  • highlights (List[str] | None) โ€” Snippets when query was set
  • time_published / time_last_crawled (str | None) โ€” ISO 8601 timestamps
  • error_message (str | None) โ€” Populated only when status="failed"
  • favicon (str | None) โ€” When include_favicon=True
  • images / videos / audio (List[Any] | None) โ€” Passthrough media objects; schema is upstream-defined
  • category (ExtractCategory | None) โ€” primary / secondary classification labels (when classifier enabled server-side)
  • page_structure (ExtractPageStructure | None) โ€” primary / secondary structure labels

client.extract.simple_extract(url)

Convenience shortcut for a single URL with default parameters.

response = client.extract.simple_extract("https://example.com")

Chat API

client.chat.create()

Create a chat completion (non-streaming or streaming).

Parameters:

  • messages (List[ChatMessage | dict], required): Conversation history. Each item can be a ChatMessage object or a plain dict {"role": ..., "content": ...}
  • model (str, required): Model ID (e.g. "openai/gpt-5.4"). See Supported Chat Models for the full list
  • stream (bool, optional): If True, return a Stream iterator of StreamEvent objects. Default False
  • web_search (str, optional): "on" to augment with live web search, "off" to disable
  • web_search_options (WebSearchOptions, optional): Fine-grained search configuration
    • safesearch (str): "off" or "strict" (default "off")
    • count (int): Number of search results, range 1-100
    • country (str): Country code for localised results (e.g. "CN")
    • include_domains / exclude_domains (List[str]): Domain filtering
    • include_text / exclude_text (List[str]): Text filtering
    • time_basis (str): "auto", "published", or "crawled"
    • start_time / end_time (str): ISO 8601 time range
    • format (str): "text" or "markdown"
    • full_content (ChatFullContentOptions): Full page content options
    • highlight (ChatHighlightOptions): Highlight snippet options
  • max_tokens (int, optional): Maximum number of output tokens
  • max_completion_tokens (int, optional): Alternative max-token parameter
  • temperature (float, optional): Sampling temperature [0, 2]
  • top_p (float, optional): Nucleus sampling probability (0, 1]
  • frequency_penalty (float, optional): Frequency penalty [-2, 2]
  • presence_penalty (float, optional): Presence penalty [-2, 2]
  • response_format (ResponseFormat, optional): Output format โ€” ResponseFormat(type="text"), ResponseFormat(type="json_object"), or ResponseFormat(type="json_schema", json_schema=...)
  • stop (List[str], optional): Up to 4 stop sequences
  • seed (int, optional): Integer seed for deterministic sampling
  • reasoning_effort (str, optional): Chain-of-thought effort: "low", "medium", or "high"
  • logprobs (bool, optional): Whether to return log probabilities
  • top_logprobs (int, optional): Number of most-likely tokens [0, 20]. Requires logprobs=True
  • logit_bias (Dict[str, float], optional): Token ID to bias value mapping
  • tools (List[Tool | dict], optional): Tool/function definitions available to the model
  • tool_choice (str | dict, optional): "none", "auto", "required", or a dict specifying a particular tool
  • user (str, optional): Opaque end-user identifier
  • timeout (float, optional): Per-request timeout in seconds (default 60s for chat)

Returns:

  • ChatCompletion when stream=False
  • Stream (iterable of StreamEvent) when stream=True

ChatCompletion Properties:

  • id - Unique completion ID
  • model - Model used for generation
  • choices - List of Choice objects
  • text - Convenience accessor for the first choice's content
  • usage - Usage object (prompt_tokens, completion_tokens, total_tokens, num_search_queries, reasoning_tokens)
  • search_results - List of ChatSearchResult (when web_search="on")
  • citations - Citation string referencing search results
  • warning - Optional warning message

StreamEvent Properties:

  • type - Event type: "search_done", "content", "finish", "usage", "error"
  • choices - List of StreamChoice (with delta.content for incremental text)
  • search_results - Web search results (on search_done event)
  • usage - Token usage (on usage event)
  • citations - Citation string (on search_done event)
  • error - StreamError with message and code (on error event)

Embedding API

client.embedding.create()

Create text embedding vectors.

Parameters:

  • input (str | List[str], required): Input text or list of texts
  • model (str, optional): Model name, options:
    • "octen-embedding-0.6b" - Lightweight model
    • "octen-embedding-4b" - Balanced performance
    • "octen-embedding-8b" - Highest quality
  • dimension (int, optional): Vector dimension
  • input_type (str, optional): Input type, options: "query" or "document"
  • truncation (bool, optional): Whether to truncate long inputs, default True
  • timeout (float, optional): Request timeout in seconds

Returns: EmbeddingResponse object

Response Methods:

  • get_embeddings() - Get all vectors
  • get_first_embedding() - Get first vector (for single input)

Convenience Methods:

  • embed_query(text) - Embed a single query text
  • embed_documents(texts) - Batch embed document texts

๐Ÿ”ง Async Support

import asyncio
from octen import AsyncOcten, ChatMessage

async def main():
    async with AsyncOcten(api_key="your-api-key") as client:
        # Concurrent chat requests
        task1 = client.chat.create(
            model="openai/gpt-5.4",
            messages=[ChatMessage(role="user", content="Explain deep learning")],
            web_search="off"
        )
        task2 = client.chat.create(
            model="anthropic/claude-sonnet-4.6",
            messages=[ChatMessage(role="user", content="Explain reinforcement learning")],
            web_search="off"
        )
        r1, r2 = await asyncio.gather(task1, task2)
        print(r1.text)
        print(r2.text)

        # Async streaming
        stream = await client.chat.create(
            model="openai/gpt-5.4",
            messages=[ChatMessage(role="user", content="Hello!")],
            stream=True
        )
        async for event in stream:
            if event.type == "content" and event.choices:
                print(event.choices[0].delta.content or "", end="", flush=True)

        # Search, extract, and embeddings also work async
        results = await client.search.search(query="AI")
        extracted = await client.extract.extract(urls=["https://example.com"])
        embedding = await client.embedding.create(input=["Hello"], model="octen-embedding-4b")

asyncio.run(main())

โš ๏ธ Error Handling

from octen import (
    Octen,
    ChatMessage,
    OctenAPIError,
    OctenTimeoutError,
    OctenConnectionError,
    OctenRateLimitError,
    OctenAuthenticationError,
    OctenStreamError,
)

with Octen(api_key="your-api-key") as client:
    try:
        response = client.chat.create(
            model="openai/gpt-5.4",
            messages=[ChatMessage(role="user", content="Hello")]
        )
    except OctenAuthenticationError:
        print("Invalid or missing API key")
    except OctenRateLimitError as e:
        print(f"Rate limited โ€” retry after {e.retry_after}s")
    except OctenStreamError as e:
        print(f"Stream error: {e.message} (code {e.code})")
    except OctenTimeoutError as e:
        print(f"Request timed out after {e.timeout}s")
    except OctenAPIError as e:
        print(f"API error {e.status_code}: {e.message}")

๐Ÿงช Development

Install Development Dependencies

# Install development version from source
pip install -e ".[dev]"

Run Tests

pytest tests/

Code Formatting

black octen/
ruff check octen/ --fix

Type Checking

mypy octen/

๐Ÿ“ License

MIT License - See LICENSE file for details

๐Ÿ”— Links

๐Ÿ“ง Support

For questions or help, please:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

octen-0.3.0.tar.gz (58.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

octen-0.3.0-py3-none-any.whl (47.1 kB view details)

Uploaded Python 3

File details

Details for the file octen-0.3.0.tar.gz.

File metadata

  • Download URL: octen-0.3.0.tar.gz
  • Upload date:
  • Size: 58.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for octen-0.3.0.tar.gz
Algorithm Hash digest
SHA256 6b03c23d7d397c52df4f980db3929b56cc87a988991f8e3a3d20aedbc417ffe5
MD5 8ab939a8eaa35d5c7fed03133e5b8b9b
BLAKE2b-256 4f9c9025af0985bac935829b7ce51fa3154b95f4b3ec0b6254f34ece48c81808

See more details on using hashes here.

File details

Details for the file octen-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: octen-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 47.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for octen-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cadd55e767480d591142379ad5d4a8d89e1d91852964c0bad68dd7eb7080d89e
MD5 53c6585932d2b3ff45e229ef82609a15
BLAKE2b-256 9ac8d21cfffdc22e6d64bc052f4da6f4216d827548fc6beac83b0d5b41f619b9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page