Skip to main content

Omni meta-search engine for agentic AI.

Project description

RaySearch

RaySearch is an async-first search orchestration engine for building AI-overview style workflows on top of multiple providers, crawlers, extractors, rankers, and LLM backends.

It exposes four high-level pipelines:

  • search: multi-provider retrieval with optional fetch and rerank stages
  • fetch: page crawling, extraction, abstracting, overview generation, and related links
  • answer: search plus grounded answer generation with citations
  • research: multi-round research reports with synthesis and structured output

Why RaySearch

  • Component-based architecture with pluggable providers, crawlers, extractors, rankers, caches, and LLM clients
  • Async-only runtime with a single Engine entry point
  • YAML/JSON settings loader plus environment injection for provider and model secrets
  • Built-in tracking and metering sinks for observability
  • Designed for search-heavy and research-heavy agent workflows rather than chat-only use cases

Installation

Core install:

uv pip install raysearch

Common full install:

uv pip install "raysearch[extract,extract_pdf,crawl,rank,cache,api,overview,tracking]"

When using Playwright-based crawling, install browser binaries separately:

playwright install

Public API

from raysearch import Engine, SearchRequest, load_settings

Primary entry points:

  • load_settings(path=None, env=None)
  • Engine.from_settings(setting_file=None, *, settings=None, overrides=None)
  • await engine.search(request)
  • await engine.fetch(request)
  • await engine.answer(request)
  • await engine.research(request)

Quick Start

from raysearch import Engine, SearchRequest

async def main() -> None:
    async with Engine.from_settings("demo/search_config_example.yaml") as engine:
        response = await engine.search(
            SearchRequest(
                query="latest multimodal model papers",
                mode="deep",
                max_results=8,
            )
        )
        for item in response.results:
            print(item.title, item.url)

Configuration

RaySearch loads settings in this order:

  1. Explicit path passed to load_settings(...)
  2. RAYSEARCH_CONFIG_PATH
  3. raysearch.yaml
  4. In-code defaults

The main configuration groups are:

  • components: provider, crawl, extract, rank, llm, cache, tracking, metering, http, and rate limiting
  • telemetry: tracking and metering emitter behavior
  • search: search-mode profiles and query-expansion behavior
  • fetch: extraction, abstract, and overview tuning
  • answer: planning and generation model selection
  • research: report-generation budgets and model routing
  • runner: concurrency and queue limits

Component families use a simple default-plus-instance shape:

components:
  provider:
    default: google
    google:
      enabled: true
      cookies:
        CONSENT: "YES+"
    duckduckgo:
      enabled: true
      base_url: https://html.duckduckgo.com/html
      allow_redirects: false

Reference configuration:

  • demo/search_config_example.yaml

Providers And Pipelines

Built-in provider coverage includes:

  • google
  • google_news
  • duckduckgo
  • searxng
  • github
  • reddit
  • reuters
  • openalex
  • semantic_scholar
  • wikidata
  • wikipedia
  • arxiv
  • marginalia
  • blend for combining providers

Built-in pipeline support includes:

  • Search result expansion and reranking
  • Markdown-first fetch extraction
  • Abstract generation and page overview synthesis
  • Citation-grounded answer generation
  • Multi-round research report generation

Environment Variables

The loader preserves the full process environment in AppSettings.runtime_env, and component config models pull values from there as needed.

Common examples:

  • OPENAI_API_KEY
  • OPENAI_BASE_URL
  • GEMINI_API_KEY
  • GEMINI_BASE_URL
  • DASHSCOPE_API_KEY
  • DASHSCOPE_BASE_URL
  • Provider-specific overrides such as GITHUB_TOKEN or SEARXNG_BASE_URL

Tracking And Metering

Tracking and metering are configured independently from the request pipelines.

Default artifact names now follow the package name:

  • tracking JSONL: .raysearch_tracking.jsonl
  • metering JSONL: .raysearch_metering.jsonl
  • metering SQLite: .raysearch_metering.sqlite3
  • cache SQLite: .raysearch_cache.sqlite3

Development

The repo includes runnable demos:

  • demo/search.py
  • demo/fetch.py
  • demo/answer.py
  • demo/research.py

Example settings:

  • demo/search_config_example.yaml

Notes

  • search.mode supports fast, auto, and deep
  • RaySearch is async-only
  • Component discovery loads from raysearch.components
  • JS-heavy crawling requires Playwright plus installed browsers

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

raysearch-0.1.0.tar.gz (283.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

raysearch-0.1.0-py3-none-any.whl (365.0 kB view details)

Uploaded Python 3

File details

Details for the file raysearch-0.1.0.tar.gz.

File metadata

  • Download URL: raysearch-0.1.0.tar.gz
  • Upload date:
  • Size: 283.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for raysearch-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ccc3faa1542c1c8bf0626e115f81ff76f6cca6bf195a892211d715db4d18c646
MD5 977a9fcc8b483b9950ba9e6b16392b6a
BLAKE2b-256 e53c3637e6abca2b90cafe25fb7878b7230a5e6470bfbe14301596f4d09b4182

See more details on using hashes here.

File details

Details for the file raysearch-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: raysearch-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 365.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for raysearch-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3140d31f452c4bbdb71c40f81e920b7130d5ea5e3984d281c24e0dc391c45675
MD5 dc88d2ca58e46b099b5385e9f25002ff
BLAKE2b-256 8f97e75e28717ba442c3451356b0b8b63ebe156be4e639cbb51dbebe23129db7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page