Skip to main content

Omni meta-search engine for agentic AI.

Project description

cover-v5-optimized

README in English 繁體中文文件 简体中文文件 日本語のREADME

RaySearch is an async-first search orchestration engine for building AI-overview style workflows on top of multiple providers, crawlers, extractors, rankers, and LLM backends.

It exposes four high-level pipelines:

  • search: multi-provider retrieval with optional fetch and rerank stages
  • fetch: page crawling, extraction, abstracting, overview generation, and related links
  • answer: search plus grounded answer generation with citations
  • research: multi-round research reports with synthesis and structured output

Why RaySearch

  • Component-based architecture with pluggable providers, crawlers, extractors, rankers, caches, and LLM clients
  • Async-only runtime with a single Engine entry point
  • YAML/JSON settings loader plus environment injection for provider and model secrets
  • Built-in tracking and metering sinks for observability
  • Designed for search-heavy and research-heavy agent workflows rather than chat-only use cases

Installation

Core install:

uv pip install raysearch

Common full install:

uv pip install "raysearch[extract,extract_pdf,crawl,rank,cache,api,overview,tracking]"

When using Playwright-based crawling, install browser binaries separately:

playwright install

Public API

from raysearch import Engine, SearchRequest, load_settings

Primary entry points:

  • load_settings(path=None, env=None)
  • Engine.from_settings(setting_file=None, *, settings=None, overrides=None)
  • await engine.search(request)
  • await engine.fetch(request)
  • await engine.answer(request)
  • await engine.research(request)

Quick Start

from raysearch import Engine, SearchRequest

async def main() -> None:
    async with Engine.from_settings("demo/search_config_example.yaml") as engine:
        response = await engine.search(
            SearchRequest(
                query="latest multimodal model papers",
                mode="deep",
                max_results=8,
            )
        )
        for item in response.results:
            print(item.title, item.url)

Configuration

RaySearch loads settings in this order:

  1. Explicit path passed to load_settings(...)
  2. RAYSEARCH_CONFIG_PATH
  3. raysearch.yaml
  4. In-code defaults

The main configuration groups are:

  • components: provider, crawl, extract, rank, llm, cache, tracking, metering, http, and rate limiting
  • telemetry: tracking and metering emitter behavior
  • search: search-mode profiles and query-expansion behavior
  • fetch: extraction, abstract, and overview tuning
  • answer: planning and generation model selection
  • research: report-generation budgets and model routing
  • runner: concurrency and queue limits

Component families use a simple default-plus-instance shape:

components:
  provider:
    default: google
    google:
      enabled: true
      cookies:
        CONSENT: "YES+"
    duckduckgo:
      enabled: true
      base_url: https://html.duckduckgo.com/html
      allow_redirects: false

Reference configuration:

  • demo/search_config_example.yaml

Providers And Pipelines

Built-in provider coverage includes:

  • google
  • google_news
  • duckduckgo
  • searxng
  • github
  • reddit
  • reuters
  • openalex
  • semantic_scholar
  • wikidata
  • wikipedia
  • arxiv
  • marginalia
  • blend for combining providers

Built-in pipeline support includes:

  • Search result expansion and reranking
  • Markdown-first fetch extraction
  • Abstract generation and page overview synthesis
  • Citation-grounded answer generation
  • Multi-round research report generation

Environment Variables

The loader preserves the full process environment in AppSettings.runtime_env, and component config models pull values from there as needed.

Common examples:

  • OPENAI_API_KEY
  • OPENAI_BASE_URL
  • GEMINI_API_KEY
  • GEMINI_BASE_URL
  • DASHSCOPE_API_KEY
  • DASHSCOPE_BASE_URL
  • Provider-specific overrides such as GITHUB_TOKEN or SEARXNG_BASE_URL

Tracking And Metering

Tracking and metering are configured independently from the request pipelines.

Default artifact names now follow the package name:

  • tracking JSONL: .raysearch_tracking.jsonl
  • metering JSONL: .raysearch_metering.jsonl
  • metering SQLite: .raysearch_metering.sqlite3
  • cache SQLite: .raysearch_cache.sqlite3

Development

The repo includes runnable demos:

  • demo/search.py
  • demo/fetch.py
  • demo/answer.py
  • demo/research.py

Example settings:

  • demo/search_config_example.yaml

Notes

  • search.mode supports fast, auto, and deep
  • RaySearch is async-only
  • Component discovery loads from raysearch.components
  • JS-heavy crawling requires Playwright plus installed browsers

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

raysearch-0.1.1.tar.gz (275.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

raysearch-0.1.1-py3-none-any.whl (354.5 kB view details)

Uploaded Python 3

File details

Details for the file raysearch-0.1.1.tar.gz.

File metadata

  • Download URL: raysearch-0.1.1.tar.gz
  • Upload date:
  • Size: 275.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for raysearch-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d20dcec227f4d32b33cc4f20b1f7201c42c594e59884c5a61b3b7c0252132f5a
MD5 c70ffa37e61c890ca5bf95d4b49b115f
BLAKE2b-256 a443b1905efb069a1d40878374df405c2c891666bd983a7fedb037bc0d03226a

See more details on using hashes here.

File details

Details for the file raysearch-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: raysearch-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 354.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for raysearch-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1e67d3b7ae5adddca59bf1d61a55216c5bac4a78f719414355240f488431b6fb
MD5 3562b52a9dbef45dc49b36a6930ca2cd
BLAKE2b-256 dc4e6ca71a7297c35c38834c14183d84c2bf4ae4cfa0a4e9d6de2099ac19cc42

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page