Web search plugin for twat

These details have not been verified by PyPI

Project links

Project description

this_file: README.md

Twat Search: multi-engine web search aggregator

Executive summary

Twat Search is a powerful, asynchronous Python package that provides a unified interface to query multiple search engines simultaneously. It facilitates efficient information retrieval by aggregating, normalizing, and processing results from various search providers through a consistent API. This comprehensive documentation serves as a definitive guide for both CLI and Python usage of the package.

Key features

Multi-Engine Search: A single query can simultaneously search across multiple providers including Brave, Google (via SerpAPI/HasData), Tavily, Perplexity, You.com, Bing (via web scraping), and more
Asynchronous Operation: Leverages asyncio for concurrent searches, maximizing speed and efficiency
Rate Limiting: Built-in mechanisms to prevent exceeding API limits of individual search providers
Strong Typing: Full type annotations and Pydantic validation for improved code reliability and maintainability
Robust Error Handling: Custom exception classes for graceful error management, with improved engine initialization and search process error handling
Flexible Configuration: Configure search engines via environment variables, .env files, or directly in code
Extensible Architecture: Designed for easy addition of new search engines
Command-Line Interface: Rich, interactive CLI for searching and exploring engine configurations
JSON Output: Supports JSON output for easy integration with other tools
Modern Path Handling: Uses pathlib.Path for robust and platform-independent file operations
Secure Temporary File Operations: Implements secure temporary file handling using the standard library's tempfile module

Recent improvements

Enhanced Error Handling: Improved error handling in engine initialization and search processes to prevent failures in one engine from affecting others
Standardized Engine Names: Added standardization of engine names for more consistent lookups and better backward compatibility
Detailed Logging: Added comprehensive logging throughout the search process for better debugging and monitoring
Graceful Fallbacks: Implemented graceful fallbacks when specific engines fail to initialize or return results

Installation options

Full installation

uv pip install --system twat-search[all]

uv pip install --system twat-search[all]

Selective installation

Install only specific engine dependencies:

# Example: install only brave and duckduckgo dependencies
pip install "twat-search[brave,duckduckgo]"

# Example: install duckduckgo and bing scraper
pip install "twat-search[duckduckgo,bing_scraper]"

After installation, both Twat Search and Twat Search-web commands should be available in your PATH. Alternatively, you can run:

python -m twat_search.__main__
python -m twat_search.web.cli

Project Documentation

The project maintains several key documentation files:

README.md: This file, containing an overview of the project, installation instructions, and usage examples.
CHANGELOG.md: Documents all notable changes to the project, organized by version.
TODO.md: Contains a prioritized list of tasks and improvements planned for the project.
LICENSE: The project's license information.

Development Workflow

When contributing to this project, please follow these guidelines:

Check the TODO.md file for prioritized tasks that need attention.
Run ./cleanup.py status regularly to check for linting errors and test failures.
Document all changes in CHANGELOG.md under the appropriate version section.
Add comprehensive tests for new features and bug fixes.
Ensure all code passes linting and type checking before submitting.

Code Quality Tools

The project uses several tools to maintain code quality:

Ruff: For linting and formatting Python code
Mypy: For static type checking
Pytest: For running tests
Pre-commit hooks: To ensure code quality before commits

Run these tools regularly during development:

# Format code
ruff format --respect-gitignore --target-version py312 .

# Lint code
ruff check --output-format=github --fix --unsafe-fixes .

# Run tests
python -m pytest

Quick start guide

Python API

import asyncio
from twat_search.web import search

async def main():
    # Search across all configured engines
    results = await search("quantum computing applications")

    # Print results
    for result in results:
        print(f"[{result.source}] {result.title}")
        print(f"URL: {result.url}")
        print(f"Snippet: {result.snippet}\n")

# Run the async function
asyncio.run(main())

Command line interface

# Search using all available engines
Twat Search q "climate change solutions"

# Search with specific engines
Twat Search q "machine learning frameworks" --engines brave,tavily

# Get json output
Twat Search q "renewable energy" --json

# Use engine-specific command
Twat Search brave "web development trends" --count 10

Core architecture

Module structure

twat_search/
└── web/
    ├── engines/            # Individual search engine implementations
    │   ├── __init__.py     # Engine registration and availability checks
    │   ├── base.py         # Base SearchEngine class definition
    │   ├── brave.py        # Brave search implementation
    │   ├── bing_scraper.py # Bing scraper implementation
    │   └── ...             # Other engine implementations
    │   └── lib_falla/      # Falla-based search engine implementations
    │       ├── core/       # Core Falla functionality
    │       │   ├── falla.py    # Base Falla class
    │       │   ├── google.py   # Google search implementation
    │       │   └── ...         # Other Falla-based implementations
    ├── __init__.py         # Module exports
    ├── api.py              # Main search API
    ├── cli.py              # Command-line interface
    ├── config.py           # Configuration handling
    ├── exceptions.py       # Custom exceptions
    ├── models.py           # Data models
    └── utils.py            # Utility functions

Supported search engines

Twat Search provides a consistent interface to the following search engines:

Engine	Module	API Key Required	Description	Package Extra
Brave	`brave`	Yes	Web search via Brave Search API	`brave`
Brave News	`brave_news`	Yes	News search via Brave API	`brave`
You.com	`you`	Yes	Web search via You.com API	-
You.com News	`you_news`	Yes	News search via You.com API	-
Tavily	`tavily`	Yes	Research-focused search API	`tavily`
Perplexity	`pplx`	Yes	AI-powered search with detailed answers	`pplx`
SerpAPI	`serpapi`	Yes	Google search results via SerpAPI	`serpapi`
HasData Google	`hasdata-google`	Yes	Google search results via HasData API	`hasdata`
HasData Google Light	`hasdata-google-light`	Yes	Light version of HasData API	`hasdata`
Critique	`critique`	Yes	Visual and textual search capabilities	-
DuckDuckGo	`duckduckgo`	No	Privacy-focused search results	`duckduckgo`
Bing Scraper	`bing_scraper`	No	Web scraping of Bing search results	`bing_scraper`
Google Falla	`google_falla`	No	Google search via Playwright-based scraping	`falla`

Configuration management

Environment variables

Configure engines using environment variables:

# Api keys
BRAVE_API_KEY=your_brave_api_key
TAVILY_API_KEY=your_tavily_api_key
PERPLEXITY_API_KEY=your_perplexity_api_key
YOU_API_KEY=your_you_api_key
SERPAPI_API_KEY=your_serpapi_api_key
CRITIQUE_API_KEY=your_critique_api_key
HASDATA_API_KEY=your_hasdata_api_key

# Engine enablement
BRAVE_ENABLED=true
TAVILY_ENABLED=true
PERPLEXITY_ENABLED=true
YOU_ENABLED=true
SERPAPI_ENABLED=true
CRITIQUE_ENABLED=true
DUCKDUCKGO_ENABLED=true
BING_SCRAPER_ENABLED=true
HASDATA_GOOGLE_ENABLED=true

# Default parameters (json format)
BRAVE_DEFAULT_PARAMS={"count": 10, "safesearch": "off"}
TAVILY_DEFAULT_PARAMS={"max_results": 5, "search_depth": "basic"}
PERPLEXITY_DEFAULT_PARAMS={"model": "pplx-7b-online"}
YOU_DEFAULT_PARAMS={"safe_search": true, "count": 8}
SERPAPI_DEFAULT_PARAMS={"num": 10, "gl": "us"}
HASDATA_GOOGLE_DEFAULT_PARAMS={"location": "Austin,Texas,United States", "device_type": "desktop"}
DUCKDUCKGO_DEFAULT_PARAMS={"max_results": 10, "safesearch": "moderate", "time": "d"}
BING_SCRAPER_DEFAULT_PARAMS={"max_retries": 3, "delay_between_requests": 1.0}

# Global default for all engines
NUM_RESULTS=5

You can store these in a .env file in your project directory, which will be automatically loaded by the library using python-dotenv .

Programmatic configuration

Configure engines programmatically when using the Python API:

from twat_search.web import Config, EngineConfig, search

# Create custom configuration
config = Config(
    engines={
        "brave": EngineConfig(
            api_key="your_brave_api_key",
            enabled=True,
            default_params={"count": 10, "country": "US"}
        ),
        "bing_scraper": EngineConfig(
            enabled=True,
            default_params={"max_retries": 3, "delay_between_requests": 1.0}
        ),
        "tavily": EngineConfig(
            api_key="your_tavily_api_key",
            enabled=True,
            default_params={"search_depth": "advanced"}
        )
    }
)

# Use the configuration
results = await search("quantum computing", config=config)

Engine-specific parameters

Each search engine accepts different parameters. Here's a reference for commonly used ones:

Brave search

await brave(
    query="search term",
    count=10,              # Number of results (default: 10)
    country="US",          # Country code (ISO 3166-1 alpha-2)
    search_lang="en",      # Search language
    ui_lang="en",          # UI language
    safe_search=True,      # Safe search (True/False)
    freshness="day"        # Time frame (day, week, month)
)

Bing scraper

await bing_scraper(
    query="search term",
    num_results=10,                # Number of results
    max_retries=3,                 # Maximum retry attempts
    delay_between_requests=1.0     # Delay between requests (seconds)
)

Tavily

await tavily(
    query="search term",
    max_results=5,               # Number of results (default: 5)
    search_depth="basic",        # Search depth (basic, advanced)
    include_domains=["example.com"],  # Domains to include
    exclude_domains=["spam.com"],     # Domains to exclude
    include_answer=True,         # Include AI-generated answer
    search_type="search"         # Search type (search, news, etc.)
)

Perplexity (pplx)

await pplx(
    query="search term",
    model="pplx-70b-online"      # Model to use for search
)

You.com

await you(
    query="search term",
    num_results=10,              # Number of results
    country_code="US",           # Country code
    safe_search=True             # Safe search (True/False)
)

Duckduckgo

await duckduckgo(
    query="search term",
    max_results=10,              # Number of results
    region="us-en",              # Region code
    safesearch=True,             # Safe search (True/False)
    timelimit="m",               # Time limit (d=day, w=week, m=month)
    timeout=10                   # Request timeout (seconds)
)

Critique (with image)

await critique(
    query="Is this image real?",
    image_url="https://example.com/image.jpg",  # URL to image
    # OR
    image_base64="base64_encoded_image_data",   # Base64 encoded image
    source_whitelist=["trusted-site.com"],      # Optional domain whitelist
    source_blacklist=["untrusted-site.com"],    # Optional domain blacklist
    output_format="text"                        # Output format
)

Error handling framework

Twat Search provides custom exception classes for proper error handling:

from twat_search.web.exceptions import SearchError, EngineError

try:
    results = await search("quantum computing")
except EngineError as e:
    print(f"Engine-specific error: {e}")
    # e.g., "Engine 'brave': API key is required"
except SearchError as e:
    print(f"General search error: {e}")
    # e.g., "No search engines configured"

The exception hierarchy:

SearchError: Base class for all search-related errors
EngineError: Subclass for engine-specific errors, includes the engine name in the message

Typical error scenarios:

Missing API keys
Network errors
Rate limiting
Invalid responses
Configuration errors

Advanced usage techniques

Concurrent searches

Search across multiple engines concurrently:

import asyncio
from twat_search.web.engines.brave import brave
from twat_search.web.engines.tavily import tavily

async def search_multiple(query):
    brave_task = brave(query)
    tavily_task = tavily(query)

    results = await asyncio.gather(brave_task, tavily_task, return_exceptions=True)

    brave_results, tavily_results = [], []
    if isinstance(results[0], list):
        brave_results = results[0]
    if isinstance(results[1], list):
        tavily_results = results[1]

    return brave_results + tavily_results

# Usage
results = await search_multiple("artificial intelligence")

Custom engine parameters

Specify engine-specific parameters in the unified search function:

from twat_search.web import search

results = await search(
    "machine learning",
    engines=["brave", "tavily", "bing_scraper"],
    # Common parameters
    num_results=10,
    country="US",

    # Engine-specific parameters
    brave_count=15,
    brave_freshness="week",
    tavily_search_depth="advanced",
    bing_scraper_max_retries=5
)

Rate limiting

Use the built-in rate limiter to avoid hitting API limits:

from twat_search.web.utils import RateLimiter

# Create a rate limiter with 5 calls per second
limiter = RateLimiter(calls_per_second=5)

# Use in an async context
async def rate_limited_search():
    for query in ["python", "javascript", "rust", "golang"]:
        limiter.wait_if_needed()  # Wait if necessary
        results = await search(query)
        # Process results...

Development guide

Running tests

# Install test dependencies
pip install "twat-search[test]"

# Run tests
pytest

# Run with coverage
pytest --cov=src/twat_search

# Run tests in parallel
pytest -n auto

Adding a new search engine

To add a new search engine:

Create a new file in src/twat_search/web/engines/
Implement a class that inherits from SearchEngine
Implement the required methods and register the engine

Example:

from pydantic import HttpUrl
from twat_search.web.engines.base import SearchEngine, register_engine
from twat_search.web.models import SearchResult
from twat_search.web.config import EngineConfig


@register_engine
class MyNewSearchEngine(SearchEngine):
    engine_code = "my_new_engine"
    env_api_key_names = ["MY_NEW_ENGINE_API_KEY"]

    def __init__(self, config: EngineConfig, **kwargs) -> None:
        super().__init__(config, **kwargs)
        # Initialize engine-specific parameters

    async def search(self, query: str) -> list[SearchResult]:
        # Implement search logic
        return [
            SearchResult(
                title="My Result",
                url=HttpUrl("https://example.com"),
                snippet="Result snippet",
                source=self.name
            )
        ]


# Convenience function
async def my_new_engine(query: str, **kwargs):
# Implement convenience function
# ...

Development setup

To contribute to Twat Search , follow these steps:

Clone the repository:

   git clone https://github.com/twardoch/Twat Search.git
   cd Twat Search

Set up the virtual environment with uv:

   uv venv
   source .venv/bin/activate

Install development dependencies:

   uv pip install -e ".[test,dev]"

Run tests:

   uv run pytest

Run type checking:

   uv run mypy src tests

Run linting:

   uv run ruff check src tests

Use cleanup.py for project maintenance:

   python cleanup.py status

Troubleshooting guide

Api key issues

If you're encountering API key errors:

Verify the API key is set correctly in environment variables
Check the API key format is valid for the specific provider
Ensure the API key has the necessary permissions
For engines that require API keys, verify the key is set via one of these methods:
- Environment variable (e.g., BRAVE_API_KEY )
- .env file
- Programmatic configuration

Rate limiting problems

If you're being rate limited by search providers:

Reduce the number of concurrent requests
Use the RateLimiter utility to space out requests
Consider upgrading your API plan with the provider
Add delay between requests for engines that support it (e.g., delay_between_requests for Bing Scraper)

No results returned

If you're not getting results:

Check that the engine is enabled (ENGINE_ENABLED=true)
Verify your query is not empty or too restrictive
Try with safe search disabled to see if content filtering is the issue
Check for engine-specific errors in the logs (use --verbose flag with CLI)
Ensure you have the required dependencies installed for the engine

Common error messages

"Engine 'X': API key is required": The engine requires an API key that hasn't been configured
"No search engines configured": No engines are enabled or available
"Unknown search engine: X": The specified engine name is invalid
"Engine 'X': is disabled": The engine is registered but disabled in configuration

Development status

Version: 1.8.1

Twat Search is actively developed. See PROGRESS.md for completed tasks and TODO.md for planned features and improvements.

Contributing

Contributions are welcome! Please check TODO.md for areas that need work. Submit pull requests or open issues on GitHub. Key areas for contribution:

Adding new search engines
Improving test coverage
Enhancing documentation
Optimizing performance
Implementing advanced features (e.g., caching, result normalization)

License

Twat Search is released under the MIT License. See the LICENSE file for details.

Appendix: available engines and requirements

Engine	Package Extra	API Key Required	Environment Variable	Notes
Brave	`brave`	Yes	`BRAVE_API_KEY`	General web search engine
Brave News	`brave`	Yes	`BRAVE_API_KEY`	News-specific search
You.com	-	Yes	`YOU_API_KEY`	AI-powered web search
You.com News	-	Yes	`YOU_API_KEY`	News-specific search
Tavily	`tavily`	Yes	`TAVILY_API_KEY`	Research-focused search
Perplexity	`pplx`	Yes	`PPLX_API_KEY`	AI-powered search with detailed answers
SerpAPI	`serpapi`	Yes	`SERPAPI_API_KEY`	Google search results API
HasData Google	`hasdata`	Yes	`HASDATA_API_KEY`	Google search results API
HasData Google Light	`hasdata`	Yes	`HASDATA_API_KEY`	Lightweight Google search API
Critique	-	Yes	`CRITIQUE_API_KEY`	Supports image analysis
DuckDuckGo	`duckduckgo`	No	-	Privacy-focused search
Bing Scraper	`bing_scraper`	No	-	Uses web scraping techniques

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.7.0

Mar 6, 2025

2.6.2

Mar 4, 2025

2.6.1

Mar 4, 2025

2.6.0

Mar 4, 2025

This version

2.5.3

Mar 4, 2025

2.5.2

Mar 4, 2025

2.2.3

Feb 26, 2025

2.2.2

Feb 26, 2025

2.2.1

Feb 25, 2025

2.2.0

Feb 25, 2025

2.1.3

Feb 21, 2025

2.1.0

Feb 25, 2025

2.0.3

Feb 25, 2025

2.0.2

Feb 25, 2025

1.8.1

Feb 15, 2025

1.8.0

Feb 15, 2025

1.7.16

Feb 15, 2025

1.7.15

Feb 15, 2025

1.7.14

Feb 15, 2025

0.0.1

Feb 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

twat_search-2.5.3.tar.gz (2.3 MB view details)

Uploaded Mar 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

twat_search-2.5.3-py3-none-any.whl (97.6 kB view details)

Uploaded Mar 4, 2025 Python 3

File details

Details for the file twat_search-2.5.3.tar.gz.

File metadata

Download URL: twat_search-2.5.3.tar.gz
Upload date: Mar 4, 2025
Size: 2.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.4

File hashes

Hashes for twat_search-2.5.3.tar.gz
Algorithm	Hash digest
SHA256	`83c0dd5839eee84f7a843102e89f7592f0d78a18820d9914b66ddf4cd751f88b`
MD5	`8fb707429f02e5be3c1beeedff978c91`
BLAKE2b-256	`5a9ab16c6ad2c35a0895aec005e6a2db9e0f0d19cbb144239ea736693d5552f5`

See more details on using hashes here.

File details

Details for the file twat_search-2.5.3-py3-none-any.whl.

File metadata

Download URL: twat_search-2.5.3-py3-none-any.whl
Upload date: Mar 4, 2025
Size: 97.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.4

File hashes

Hashes for twat_search-2.5.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`884d024c4e4900e0f992185abda4115a4930863e9b8c67ae5a7f20c82be3d194`
MD5	`8a5413136998f2f3edbf39109ba91df6`
BLAKE2b-256	`17baaf03ead4561a1b556e04f73d8a87e444ee35d6a7c22e855b99388fdace56`

See more details on using hashes here.

twat-search 2.5.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

this_file: README.md

Twat Search: multi-engine web search aggregator

Executive summary

Key features

Recent improvements

Installation options

Full installation

Selective installation

Project Documentation

Development Workflow

Code Quality Tools

Quick start guide

Python API

Command line interface

Core architecture

Module structure

Supported search engines

Configuration management

Environment variables

Programmatic configuration

Engine-specific parameters

Brave search

Bing scraper

Tavily

Perplexity (pplx)

You.com

Duckduckgo

Critique (with image)

Error handling framework

Advanced usage techniques

Concurrent searches

Custom engine parameters

Rate limiting

Development guide

Running tests

Adding a new search engine

Development setup

Troubleshooting guide

Api key issues

Rate limiting problems

No results returned

Common error messages

Development status

Contributing

License

Appendix: available engines and requirements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes