Skip to main content

Web Search Proxy implementation

Project description

Unique Search Proxy

Unified web egress proxy for search engines and crawlers. Three publishable packages in this repo:

PyPI name Module Role
unique-search-proxy unique_search_proxy_client.web FastAPI server (proxy pod)
unique-search-proxy-sdk unique_search_proxy_sdk Async HTTP client for callers
unique-search-proxy-core unique_search_proxy_core Shared Pydantic types (no FastAPI)
flowchart LR
  subgraph caller["Caller pod"]
    SDK["unique_search_proxy_sdk"]
  end
  subgraph proxy["Proxy pod"]
    API["unique_search_proxy_client.web"]
    Pool["HttpClientPool"]
  end
  Core["unique_search_proxy_core"]
  Internet["Google / public web"]
  SDK --> Core
  API --> Core
  SDK -->|"POST /v1/search"| API
  API --> Pool
  Pool --> Internet
  • Server owns registry, secrets, Prometheus, and egress (HttpClientPool).
  • SDK wraps the OpenAPI contract; depends on core for GoogleConfig, errors, etc.
  • Core is server-free and safe to install without FastAPI/uvicorn.

Quick Start

Prerequisites

  • Python 3.12+
  • uv for dependency management

Installation

uv sync
cp .env.example .env
# Edit .env: set GOOGLE_SEARCH_API_KEY and GOOGLE_SEARCH_ENGINE_ID for live /v1/search

Running

uv run python -m unique_search_proxy_client.web.app
# or
uv run uvicorn unique_search_proxy_client.web.app:app --reload --port 2349

Python SDK (unique-search-proxy-sdk)

Workspace path: connectors/unique_search_proxy/unique_search_proxy_sdk/. Generated from the server OpenAPI spec via openapi-python-client.

Path Role
unique_search_proxy_sdk/_generated/ Regenerated httpx client + attrs models
unique_search_proxy_sdk/client.py UniqueSearchProxyClient facade
connectors/unique_search_proxy/unique_search_proxy_client/openapi.json Exported spec (codegen input)

Regenerate after API changes

cd connectors/unique_search_proxy/unique_search_proxy_client
uv sync
uv run python scripts/generate_sdk.py

Usage

from unique_search_proxy_sdk import UniqueSearchProxyClient

async with UniqueSearchProxyClient("http://unique-search-proxy:2349") as client:
    await client.health()
    result = await client.search.search("unique ag", engine="google", fetchSize=10)
    crawl = await client.crawl.crawl(["https://example.com"], crawler="basic")

    # Low-level: one generated function per route
    raw = client.openapi  # OpenAPIClient from _generated
Facade method HTTP
health() GET /health
ready() GET /ready
search.search(...) POST /v1/search
crawl.crawl(...) POST /v1/crawl

Deployment config JSON Schema, defaults, and LLM call-schema projection live in unique_search_proxy_core (not HTTP). Assistants-core and tooling import those helpers directly.

Non-success responses raise the same ProxyError subclasses as the service. Generated request/response models live under sdk._generated.models.

For tests, pass an httpx.AsyncClient with ASGITransport(app=create_app()) and run the app lifespan so in-app egress is initialized.

Other OpenAPI codegen tools

Tool Notes
OpenAPI Generator Broad language support; verbose Python output
openapi-python-client Used here — async httpx + attrs
datamodel-code-generator Pydantic models only
Kiota Multi-language SDKs

API (application)

Endpoint Description
GET /health Liveness
GET /ready Readiness (httpx pool + registered providers)
GET /v1/configuration/providers Registered search engine and crawler ids
POST /v1/search Execute search (flat request: engine, query, provider params, timeout)
POST /v1/crawl Crawl URLs via configured crawler (flat request: crawler, urls, timeout, …)
GET /metrics Prometheus scrape endpoint (when enabled)
/docs OpenAPI (Swagger UI) — use Try it out and the request-body Examples dropdown on /v1/search and /v1/crawl

Set ENABLED=false on monitoring settings (PrometheusSettings) to disable metrics. With WORKERS > 1, the entrypoint sets PROMETHEUS_MULTIPROC_DIR for correct aggregation across uvicorn workers.

Settings are colocated with each component and use env prefixes:

Component Prefix / vars Example
Google search (no prefix) GOOGLE_SEARCH_API_KEY, GOOGLE_SEARCH_ENGINE_ID
HTTP client HTTP_CLIENT_ HTTP_CLIENT_PROXY_HOST, HTTP_CLIENT_POOL_TIMEOUT_SECONDS
Prometheus PROMETHEUS_ PROMETHEUS_ENABLED
Container entrypoint (shell) HOST, PORT, WORKERS, LOG_LEVEL, PROMETHEUS_MULTIPROC_DIR

Copy .example.env to .env for a annotated template of all settings. Shared helpers live in web/settings/.

Runtime discovery (GET /v1/configuration/providers)

Lists search engine and crawler ids registered in the proxy pod (depends on env/secrets). Use this for health checks and capability discovery at runtime.

Deployment config JSON Schema, defaults, and LLM call-schema projection are core library concerns — import from unique_search_proxy_core.providers.schema and unique_search_proxy_core.search_engines.call_schema (or the crawl equivalents). Assistants-core embeds those shapes in tool manifests rather than calling extra HTTP routes on the proxy.

Search (POST /v1/search)

Flat request body: all execution fields at the top level (engine, query, optional provider knobs, timeout). Tooling merges deployment config with LLM invocation in core (merge_config_and_invocation) before calling the proxy.

{
  "engine": "google",
  "query": "example query",
  "fetchSize": 10,
  "gl": "de",
  "dateRestrict": "d7",
  "timeout": 30
}
  • engine: registered search engine id (discriminator)
  • query, fetchSize, optional provider knobs, timeout: flat execution payload on POST /v1/search
  • Deployment config (ExposableParam with expose + value): resolved in core before building the flat search request — not a separate HTTP surface on the proxy
  • LLM call schema: unique_search_proxy_core.search_engines.call_schema.resolve_search_call_schema(...) with optional strict=False for nullable exposed fields

Response:

{
  "engine": "google",
  "query": "example query",
  "raw": {
    "pages": [
      {
        "pageIndex": 1,
        "offset": 1,
        "requestedCount": 10,
        "response": {}
      }
    ]
  },
  "curated": [
    {
      "url": "https://example.com",
      "title": "Example",
      "snippet": "...",
      "content": ""
    }
  ]
}

Crawl (POST /v1/crawl)

{
  "urls": ["https://example.com"],
  "crawler": "basic",
  "timeout": 30
}

Errors

Non-2xx responses use a structured envelope:

{
  "error": {
    "code": "ENGINE_NOT_CONFIGURED",
    "message": "Engine 'google' is not registered or not configured",
    "engine": "google",
    "retryable": false
  }
}

Project Structure

connectors/unique_search_proxy/
├── unique_search_proxy/
│   ├── sdk/                    # HTTP SDK (callers → proxy API)
│   │   ├── _generated/         # openapi-python-client output (regenerate via scripts/)
│   │   ├── client.py           # UniqueSearchProxyClient facade
│   │   ├── converters.py       # App Pydantic config → generated models
│   │   └── errors.py           # Maps API error envelope → ProxyError
│   ├── openapi.json            # Exported OpenAPI (codegen input)
│   ├── scripts/generate_sdk.py
│   └── web/                    # FastAPI application (proxy pod)
│       ├── app.py              # App factory + lifespan (HttpClientPool)
│       ├── settings/
│       ├── api/
│       │   ├── health.py
│       │   └── v1/
│       │       ├── configuration.py
│       │       ├── search.py
│       │       └── crawl.py
│       ├── monitoring/
│       └── core/
│           ├── client/         # Egress pool — application only, not SDK
│           ├── search_engines/
│           └── crawlers/
├── tests/
└── deploy/

Engines and crawlers register via web/core/registry.py at application startup.

Development

uv run ruff check .
uv run ruff format .
uv run pytest
uv run basedpyright

License

Proprietary - Unique AG

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unique_search_proxy-2026.26.0.dev6.tar.gz (20.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unique_search_proxy-2026.26.0.dev6-py3-none-any.whl (38.6 kB view details)

Uploaded Python 3

File details

Details for the file unique_search_proxy-2026.26.0.dev6.tar.gz.

File metadata

  • Download URL: unique_search_proxy-2026.26.0.dev6.tar.gz
  • Upload date:
  • Size: 20.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for unique_search_proxy-2026.26.0.dev6.tar.gz
Algorithm Hash digest
SHA256 046c3b127ecbfa6b895c0b7de3af9dd604382a75f424d79fc27b3a3330b5e60a
MD5 6cde4227bf8decf131fb492f4b0265db
BLAKE2b-256 3ef22eeb41ce2f936bbfbbff3caf8d5d3d3eb0055b0ca6cf5fb94baf94c34f62

See more details on using hashes here.

File details

Details for the file unique_search_proxy-2026.26.0.dev6-py3-none-any.whl.

File metadata

  • Download URL: unique_search_proxy-2026.26.0.dev6-py3-none-any.whl
  • Upload date:
  • Size: 38.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for unique_search_proxy-2026.26.0.dev6-py3-none-any.whl
Algorithm Hash digest
SHA256 078c066bf1f7821ca28c9f1770da2c0a82ca808d1e68ac55284226f37e5adc46
MD5 2c64db40bf37403c8c40622b6ca2f483
BLAKE2b-256 2ac4fd824271ddcd21d49594f6054986bc6e6ad1f5d32d82be7b24b9a5a579c9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page