Skip to main content

No project description provided

Project description

Unique Web Search

A powerful, configurable web search tool for retrieving and processing the latest information from the internet. This package provides intelligent search capabilities with support for multiple search engines, web crawlers, and content processing strategies.

Architecture

The following diagram illustrates the complete architecture and workflow of the unique_web_search package:

Web Search Tool Architecture

Key Features

  • Dual Execution Modes:

    • V1 (Traditional): Query refinement with single or multiple search strategies
    • V2 (Step-based Planning): Advanced research planning with parallel execution
  • Multiple Search Engines:

    • Google Search
    • Bing Search
    • Brave Search
    • Jina Search
    • Tavily Search
    • Firecrawl Search
  • Multiple Web Crawlers:

    • Basic HTTP Crawler
    • Crawl4AI
    • Jina Reader
    • Tavily Crawler
    • Firecrawl Crawler
  • Intelligent Content Processing:

    • LLM-based summarization
    • Token-based truncation
    • Relevancy scoring and sorting
    • Content chunking and optimization
  • Query Refinement:

    • BASIC Mode: Single optimized search query
    • ADVANCED Mode: Multiple targeted search queries for complex research
  • Performance Optimized:

    • Parallel execution of search and crawl operations
    • Token limit management
    • Configurable timeouts and error handling

Configuration

The tool uses environment variables and configuration files to manage API keys and settings. Key configuration areas include:

  • Search engine selection and API keys
  • Crawler selection and configuration
  • Content processing strategies (SUMMARIZE, TRUNCATE, NONE)
  • Token limits and relevancy thresholds
  • Proxy configuration
  • Debug and monitoring options

Workflow

  1. Input: User query or structured search plan
  2. Configuration: Load settings and initialize services
  3. Execution:
    • V1: Query refinement → Search → Crawl → Process
    • V2: Execute planned steps in parallel → Process
  4. Content Processing: Clean, summarize/truncate, and chunk content
  5. Optimization: Reduce to token limits and sort by relevance
  6. Output: Return structured content chunks optimized for LLM consumption

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[1.5.2] - 2025-11-10

  • Separate the configuration of modes to prevent breaking the frontend

[1.5.1] - 2025-11-10

  • Flag V2 mode and Advanced Query Refinement as Beta

[1.5.0] - 2025-11-10

  • Add support for private endpoint transport (for Workload identity authentication)

[1.4.0] - 2025-11-10

  • Expose Search Mode Configuration

[1.3.6] - 2025-10-29

  • Fix minor notification display issue and remove unnecssary log

[1.3.5] - 2025-10-29

  • Upgrading azure-ai-projects to 1.0.0 version (relevant for bing search)

[1.3.4] - 2025-10-28

  • Removing unused tool specific get_tool_call_result_for_loop_history function

## [1.3.3] - 2025-10-14

  • Fix bug in selecting the refine query mode

[1.3.2] - 2025-10-10

  • Add possibility to switch proxy auth protocol (http or https)

[1.3.1] - 2025-10-09

  • Update loading path of DEFAULT_GPT_4o from unique_toolkit

[1.3.0] - 2025-10-06

  • Proxy Authentication Support: Route search engine and crawler requests through proxies with multiple authentication methods:
    • Username/Password authentication
    • Client Certificate authentication
  • Active Crawlers: Dynamic crawler activation system allowing selective enablement of crawling services:
    • In-house crawlers: Control activation via environment variables for internal crawlers (Basic, Crawl4AI.)
    • External crawlers: Auto-activate when API keys are configured (Firecrawl, Jina, Tavily)
  • Test Coverage: Added comprehensive tests to ensure web search tool stability and reliability

[1.2.0] - 2025-09-29

  • Mark new crawlers as experimental

[1.1.0] - 2025-09-24

  • Set active search engine through active_search_engines env variable

[1.0.3] - 2025-09-23

  • Add field to track execution time of the excutors

[1.0.2] - 2025-09-23

  • Paralellize steps execution for V2 mode.

[1.0.1] - 2025-09-23

  • Add octet-stream to blacklisted content-types and allow to change the unwanted-types from config

[1.0.0] - 2025-09-18

  • Bump toolkit version to allow for both patch and minor updates

[0.2.0] - 2025-09-17

  • Add support for Brave and Grounding by Bing through azure

[0.1.4] - 2025-09-17

  • Updated to latest toolkit

[0.1.3] - 2025-09-17

  • Add content utf8 cleanup logic when processing content

[0.1.2] - 2025-09-15

  • Fix Minor bug in transforming toolResponse to toolCallResult

[0.1.1] - 2025-09-15

Added

  • WebSearchV2Executor: New step-based execution model supporting both search and direct URL reading operations
  • BaseWebSearchExecutor: Abstract base class providing common functionality between executor versions
  • Enhanced Schema: New model WebSearchPlan for structured web search planning
  • Flexible Step Execution: Support for mixed search and URL reading operations in a single plan

Changed

  • Architecture Refactor: Improved executor structure with better separation of concerns
  • Configuration Enhancement: Added experimental features flag to switch between V1 and V2 modes
  • Progress Reporting: Enhanced with step-specific notifications and better user feedback

Maintained

  • Backward Compatibility: Existing V1 executor functionality preserved
  • API Consistency: No breaking changes to existing tool interfaces

[0.1.0] - 2025-09-12

  • Code simplification
  • Enable new crawlers
  • Default cleaning of search results
  • Refactor of code structure and crawler location

[0.0.6] - 2025-09-05

  • Updated unique_web_search README.

[0.0.5] - 2025-09-04

  • Path change of loading local .env.

[0.0.4] - 2025-09-01

  • Reduce default crawler timeout to 10s.

[0.0.3] - 2025-08-18

  • Auto-register Tool in Factory.

[0.0.2] - 2025-08-18

  • Moved out of private repo to public repo.

[0.0.1] - 2025-08-18

  • Initial release of web_search.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unique_web_search-1.5.2.tar.gz (44.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unique_web_search-1.5.2-py3-none-any.whl (61.6 kB view details)

Uploaded Python 3

File details

Details for the file unique_web_search-1.5.2.tar.gz.

File metadata

  • Download URL: unique_web_search-1.5.2.tar.gz
  • Upload date:
  • Size: 44.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.3 Linux/6.11.0-1018-azure

File hashes

Hashes for unique_web_search-1.5.2.tar.gz
Algorithm Hash digest
SHA256 8c66f5f141f79d95d8a05427399fe2799ecd46413bdaa54262693e337ef1aaca
MD5 87a6cc7655099043129bac4627ccd19a
BLAKE2b-256 d964345ec76ee9419eb07f53a0f370141b6139ea5071a68a4ee52dc70a881aa8

See more details on using hashes here.

File details

Details for the file unique_web_search-1.5.2-py3-none-any.whl.

File metadata

  • Download URL: unique_web_search-1.5.2-py3-none-any.whl
  • Upload date:
  • Size: 61.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.3 Linux/6.11.0-1018-azure

File hashes

Hashes for unique_web_search-1.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 929b95ef887d3cdbe15b06d66c7aa42331c88eca8d63f2efd1c302de2d05210e
MD5 a8e51cccdea6dbf8c3038dd5053ad119
BLAKE2b-256 2fedd96e777354479f294c1550a13881a0abdeb87b385339801d1a2d247dd9ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page