Skip to main content

No project description provided

Project description

Unique Web Search

A powerful, configurable web search tool for retrieving and processing the latest information from the internet. This package provides intelligent search capabilities with support for multiple search engines, web crawlers, and content processing strategies.

Architecture

The following diagram illustrates the complete architecture and workflow of the unique_web_search package:

Web Search Tool Architecture

Key Features

  • Dual Execution Modes:

    • V1 (Traditional): Query refinement with single or multiple search strategies
    • V2 (Step-based Planning): Advanced research planning with parallel execution
  • Multiple Search Engines:

    • Google Search
    • Bing Search
    • Brave Search
    • Jina Search
    • Tavily Search
    • Firecrawl Search
    • VertexAI (Gemini with Grounding)
    • Custom API (integrate any compatible web search API)
  • Multiple Web Crawlers:

    • Basic HTTP Crawler
    • Crawl4AI
    • Jina Reader
    • Tavily Crawler
    • Firecrawl Crawler
  • Intelligent Content Processing:

    • LLM-based summarization
    • Token-based truncation
    • Relevancy scoring and sorting
    • Content chunking and optimization
  • Query Refinement:

    • BASIC Mode: Single optimized search query
    • ADVANCED Mode: Multiple targeted search queries for complex research
  • Performance Optimized:

    • Parallel execution of search and crawl operations
    • Token limit management
    • Configurable timeouts and error handling

Detailed Subsystem Docs

For deeper dives into each subsystem, see the dedicated READMEs:

  • Search Engines — full catalogue of supported engines, configuration, and usage examples.
  • Crawlers — comparison of crawling strategies (Basic, Crawl4AI, Tavily, Firecrawl, Jina) with setup guides.
  • Executors — orchestration layer (V1 & V2) covering query refinement, planning, logging, and best practices.

Configuration

The tool uses environment variables and configuration files to manage API keys and settings. Key configuration areas include:

  • Search engine selection and API keys
  • Crawler selection and configuration
  • Content processing strategies (SUMMARIZE, TRUNCATE, NONE)
  • Token limits and relevancy thresholds
  • Proxy configuration
  • Debug and monitoring options

Workflow

  1. Input: User query or structured search plan
  2. Configuration: Load settings and initialize services
  3. Execution:
    • V1: Query refinement → Search → Crawl → Process
    • V2: Execute planned steps in parallel → Process
  4. Content Processing: Clean, summarize/truncate, and chunk content
  5. Optimization: Reduce to token limits and sort by relevance
  6. Output: Return structured content chunks optimized for LLM consumption

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[1.7.2] - 2025-12-17

  • Update failsafe execution import path

[1.7.1] - 2025-12-03

  • Use strings instead of dict to configure payload to ensure better integration with current frontend

[1.7.0] - 2025-12-01

  • Added full VertexAI search engine integration (Gemini + Google grounding) with service-account authentication and redirect resolution.
  • Introduced the pluggable Custom API search engine so customers can register any compliant web-search backend via simple GET/POST specs.

[1.6.1] - 2025-11-20

  • Cleaner call of tool name with display name in logger tool.

[1.6.0] - 2025-11-20

  • Include message log messages

[1.5.4] - 2025-11-12

  • Move pytest and pytest-asyncio to dev dependencies

[1.5.3] - 2025-11-10

  • Use SkipJsonSchema for mode under WebSearchMode config to prevent displaying an editable field

[1.5.2] - 2025-11-10

  • Separate the configuration of modes to prevent breaking the frontend

[1.5.1] - 2025-11-10

  • Flag V2 mode and Advanced Query Refinement as Beta

[1.5.0] - 2025-11-10

  • Add support for private endpoint transport (for Workload identity authentication)

[1.4.0] - 2025-11-10

  • Expose Search Mode Configuration

[1.3.6] - 2025-10-29

  • Fix minor notification display issue and remove unnecssary log

[1.3.5] - 2025-10-29

  • Upgrading azure-ai-projects to 1.0.0 version (relevant for bing search)

[1.3.4] - 2025-10-28

  • Removing unused tool specific get_tool_call_result_for_loop_history function

## [1.3.3] - 2025-10-14

  • Fix bug in selecting the refine query mode

[1.3.2] - 2025-10-10

  • Add possibility to switch proxy auth protocol (http or https)

[1.3.1] - 2025-10-09

  • Update loading path of DEFAULT_GPT_4o from unique_toolkit

[1.3.0] - 2025-10-06

  • Proxy Authentication Support: Route search engine and crawler requests through proxies with multiple authentication methods:
    • Username/Password authentication
    • Client Certificate authentication
  • Active Crawlers: Dynamic crawler activation system allowing selective enablement of crawling services:
    • In-house crawlers: Control activation via environment variables for internal crawlers (Basic, Crawl4AI.)
    • External crawlers: Auto-activate when API keys are configured (Firecrawl, Jina, Tavily)
  • Test Coverage: Added comprehensive tests to ensure web search tool stability and reliability

[1.2.0] - 2025-09-29

  • Mark new crawlers as experimental

[1.1.0] - 2025-09-24

  • Set active search engine through active_search_engines env variable

[1.0.3] - 2025-09-23

  • Add field to track execution time of the excutors

[1.0.2] - 2025-09-23

  • Paralellize steps execution for V2 mode.

[1.0.1] - 2025-09-23

  • Add octet-stream to blacklisted content-types and allow to change the unwanted-types from config

[1.0.0] - 2025-09-18

  • Bump toolkit version to allow for both patch and minor updates

[0.2.0] - 2025-09-17

  • Add support for Brave and Grounding by Bing through azure

[0.1.4] - 2025-09-17

  • Updated to latest toolkit

[0.1.3] - 2025-09-17

  • Add content utf8 cleanup logic when processing content

[0.1.2] - 2025-09-15

  • Fix Minor bug in transforming toolResponse to toolCallResult

[0.1.1] - 2025-09-15

Added

  • WebSearchV2Executor: New step-based execution model supporting both search and direct URL reading operations
  • BaseWebSearchExecutor: Abstract base class providing common functionality between executor versions
  • Enhanced Schema: New model WebSearchPlan for structured web search planning
  • Flexible Step Execution: Support for mixed search and URL reading operations in a single plan

Changed

  • Architecture Refactor: Improved executor structure with better separation of concerns
  • Configuration Enhancement: Added experimental features flag to switch between V1 and V2 modes
  • Progress Reporting: Enhanced with step-specific notifications and better user feedback

Maintained

  • Backward Compatibility: Existing V1 executor functionality preserved
  • API Consistency: No breaking changes to existing tool interfaces

[0.1.0] - 2025-09-12

  • Code simplification
  • Enable new crawlers
  • Default cleaning of search results
  • Refactor of code structure and crawler location

[0.0.6] - 2025-09-05

  • Updated unique_web_search README.

[0.0.5] - 2025-09-04

  • Path change of loading local .env.

[0.0.4] - 2025-09-01

  • Reduce default crawler timeout to 10s.

[0.0.3] - 2025-08-18

  • Auto-register Tool in Factory.

[0.0.2] - 2025-08-18

  • Moved out of private repo to public repo.

[0.0.1] - 2025-08-18

  • Initial release of web_search.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unique_web_search-1.7.2.tar.gz (63.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unique_web_search-1.7.2-py3-none-any.whl (85.7 kB view details)

Uploaded Python 3

File details

Details for the file unique_web_search-1.7.2.tar.gz.

File metadata

  • Download URL: unique_web_search-1.7.2.tar.gz
  • Upload date:
  • Size: 63.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.3 Linux/6.11.0-1018-azure

File hashes

Hashes for unique_web_search-1.7.2.tar.gz
Algorithm Hash digest
SHA256 a5d0c9d69db0f4a204510f7d742cd6e8df014c6fdf350cf9fa7bad8a0b794a6b
MD5 4a1a35d2ae2f3c567749606fa6a765d0
BLAKE2b-256 cf77a59d16f49fe8052aa9fa3b8780cdc564d93f241b7d36bd9e3c9ec90c3b14

See more details on using hashes here.

File details

Details for the file unique_web_search-1.7.2-py3-none-any.whl.

File metadata

  • Download URL: unique_web_search-1.7.2-py3-none-any.whl
  • Upload date:
  • Size: 85.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.3 Linux/6.11.0-1018-azure

File hashes

Hashes for unique_web_search-1.7.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2ff5d62b41cfee7b2c654f05be4049cdc44dc0d0e84b174dd0c4b831b093dd50
MD5 0b49d4105971587da4802625bcddea76
BLAKE2b-256 5ff532ceb0b87785a9cd464c029c73409376ef12917e9fd68dcb1f34f8d37ae9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page