Skip to main content

No project description provided

Project description

Unique Web Search

A powerful, configurable web search tool for retrieving and processing the latest information from the internet. This package provides intelligent search capabilities with support for multiple search engines, web crawlers, and content processing strategies.

Architecture

The following diagram illustrates the complete architecture and workflow of the unique_web_search package:

Web Search Tool Architecture

Key Features

  • Dual Execution Modes:

    • V1 (Traditional): Query refinement with single or multiple search strategies
    • V2 (Step-based Planning): Advanced research planning with parallel execution
  • Multiple Search Engines:

    • Google Search
    • Bing Search
    • Brave Search
    • Jina Search
    • Tavily Search
    • Firecrawl Search
    • VertexAI (Gemini with Grounding)
    • Custom API (integrate any compatible web search API)
  • Multiple Web Crawlers:

    • Basic HTTP Crawler
    • Crawl4AI
    • Jina Reader
    • Tavily Crawler
    • Firecrawl Crawler
  • Intelligent Content Processing:

    • LLM-based summarization
    • Token-based truncation
    • Relevancy scoring and sorting
    • Content chunking and optimization
  • Query Refinement:

    • BASIC Mode: Single optimized search query
    • ADVANCED Mode: Multiple targeted search queries for complex research
  • Performance Optimized:

    • Parallel execution of search and crawl operations
    • Token limit management
    • Configurable timeouts and error handling

Detailed Subsystem Docs

For deeper dives into each subsystem, see the dedicated READMEs:

  • Search Engines — full catalogue of supported engines, configuration, and usage examples.
  • Crawlers — comparison of crawling strategies (Basic, Crawl4AI, Tavily, Firecrawl, Jina) with setup guides.
  • Executors — orchestration layer (V1 & V2) covering query refinement, planning, logging, and best practices.

Configuration

The tool uses environment variables and configuration files to manage API keys and settings. Key configuration areas include:

  • Search engine selection and API keys
  • Crawler selection and configuration
  • Content processing strategies (SUMMARIZE, TRUNCATE, NONE)
  • Token limits and relevancy thresholds
  • Proxy configuration
  • Debug and monitoring options

Workflow

  1. Input: User query or structured search plan
  2. Configuration: Load settings and initialize services
  3. Execution:
    • V1: Query refinement → Search → Crawl → Process
    • V2: Execute planned steps in parallel → Process
  4. Content Processing: Clean, summarize/truncate, and chunk content
  5. Optimization: Reduce to token limits and sort by relevance
  6. Output: Return structured content chunks optimized for LLM consumption

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[1.7.1] - 2025-12-03

  • Use strings instead of dict to configure payload to ensure better integration with current frontend

[1.7.0] - 2025-12-01

  • Added full VertexAI search engine integration (Gemini + Google grounding) with service-account authentication and redirect resolution.
  • Introduced the pluggable Custom API search engine so customers can register any compliant web-search backend via simple GET/POST specs.

[1.6.1] - 2025-11-20

  • Cleaner call of tool name with display name in logger tool.

[1.6.0] - 2025-11-20

  • Include message log messages

[1.5.4] - 2025-11-12

  • Move pytest and pytest-asyncio to dev dependencies

[1.5.3] - 2025-11-10

  • Use SkipJsonSchema for mode under WebSearchMode config to prevent displaying an editable field

[1.5.2] - 2025-11-10

  • Separate the configuration of modes to prevent breaking the frontend

[1.5.1] - 2025-11-10

  • Flag V2 mode and Advanced Query Refinement as Beta

[1.5.0] - 2025-11-10

  • Add support for private endpoint transport (for Workload identity authentication)

[1.4.0] - 2025-11-10

  • Expose Search Mode Configuration

[1.3.6] - 2025-10-29

  • Fix minor notification display issue and remove unnecssary log

[1.3.5] - 2025-10-29

  • Upgrading azure-ai-projects to 1.0.0 version (relevant for bing search)

[1.3.4] - 2025-10-28

  • Removing unused tool specific get_tool_call_result_for_loop_history function

## [1.3.3] - 2025-10-14

  • Fix bug in selecting the refine query mode

[1.3.2] - 2025-10-10

  • Add possibility to switch proxy auth protocol (http or https)

[1.3.1] - 2025-10-09

  • Update loading path of DEFAULT_GPT_4o from unique_toolkit

[1.3.0] - 2025-10-06

  • Proxy Authentication Support: Route search engine and crawler requests through proxies with multiple authentication methods:
    • Username/Password authentication
    • Client Certificate authentication
  • Active Crawlers: Dynamic crawler activation system allowing selective enablement of crawling services:
    • In-house crawlers: Control activation via environment variables for internal crawlers (Basic, Crawl4AI.)
    • External crawlers: Auto-activate when API keys are configured (Firecrawl, Jina, Tavily)
  • Test Coverage: Added comprehensive tests to ensure web search tool stability and reliability

[1.2.0] - 2025-09-29

  • Mark new crawlers as experimental

[1.1.0] - 2025-09-24

  • Set active search engine through active_search_engines env variable

[1.0.3] - 2025-09-23

  • Add field to track execution time of the excutors

[1.0.2] - 2025-09-23

  • Paralellize steps execution for V2 mode.

[1.0.1] - 2025-09-23

  • Add octet-stream to blacklisted content-types and allow to change the unwanted-types from config

[1.0.0] - 2025-09-18

  • Bump toolkit version to allow for both patch and minor updates

[0.2.0] - 2025-09-17

  • Add support for Brave and Grounding by Bing through azure

[0.1.4] - 2025-09-17

  • Updated to latest toolkit

[0.1.3] - 2025-09-17

  • Add content utf8 cleanup logic when processing content

[0.1.2] - 2025-09-15

  • Fix Minor bug in transforming toolResponse to toolCallResult

[0.1.1] - 2025-09-15

Added

  • WebSearchV2Executor: New step-based execution model supporting both search and direct URL reading operations
  • BaseWebSearchExecutor: Abstract base class providing common functionality between executor versions
  • Enhanced Schema: New model WebSearchPlan for structured web search planning
  • Flexible Step Execution: Support for mixed search and URL reading operations in a single plan

Changed

  • Architecture Refactor: Improved executor structure with better separation of concerns
  • Configuration Enhancement: Added experimental features flag to switch between V1 and V2 modes
  • Progress Reporting: Enhanced with step-specific notifications and better user feedback

Maintained

  • Backward Compatibility: Existing V1 executor functionality preserved
  • API Consistency: No breaking changes to existing tool interfaces

[0.1.0] - 2025-09-12

  • Code simplification
  • Enable new crawlers
  • Default cleaning of search results
  • Refactor of code structure and crawler location

[0.0.6] - 2025-09-05

  • Updated unique_web_search README.

[0.0.5] - 2025-09-04

  • Path change of loading local .env.

[0.0.4] - 2025-09-01

  • Reduce default crawler timeout to 10s.

[0.0.3] - 2025-08-18

  • Auto-register Tool in Factory.

[0.0.2] - 2025-08-18

  • Moved out of private repo to public repo.

[0.0.1] - 2025-08-18

  • Initial release of web_search.

Project details


Release history Release notifications | RSS feed

This version

1.7.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unique_web_search-1.7.1.tar.gz (62.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unique_web_search-1.7.1-py3-none-any.whl (85.6 kB view details)

Uploaded Python 3

File details

Details for the file unique_web_search-1.7.1.tar.gz.

File metadata

  • Download URL: unique_web_search-1.7.1.tar.gz
  • Upload date:
  • Size: 62.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.3 Linux/6.11.0-1018-azure

File hashes

Hashes for unique_web_search-1.7.1.tar.gz
Algorithm Hash digest
SHA256 31cfa2bc4143ca817c8517d00de134a89cd85dc38b85c72b2665ae55a2294413
MD5 c37cfc0ba8ed11e38d0b52e86b466e3e
BLAKE2b-256 4c0352761e91d472d6127cf4e695a16aa9d988c50125ea459842a9e0dbdca0dd

See more details on using hashes here.

File details

Details for the file unique_web_search-1.7.1-py3-none-any.whl.

File metadata

  • Download URL: unique_web_search-1.7.1-py3-none-any.whl
  • Upload date:
  • Size: 85.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.3 Linux/6.11.0-1018-azure

File hashes

Hashes for unique_web_search-1.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d5c2f50c8bec33890011c572027d90e6092434c350f1d97405e50e7253649b5b
MD5 5be123780f0ec2f376a6b7040625e604
BLAKE2b-256 82483333caf09bff87badf69f177caf7aab8b45de4b22078a1ea1d6f5f573d29

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page