Skip to main content

Async web search client using Brave Search API with built-in caching, rate limiting, and batch processing

Project description

Searcherator

Searcherator is a Python package that provides a convenient way to perform web searches using the Brave Search API with built-in caching, automatic rate limiting, and efficient batch processing capabilities.

Features

  • Async/await support for modern Python applications
  • Automatic caching with configurable TTL
  • Optional DynamoDB backend for cross-machine cache sharing
  • Built-in rate limiting to respect API quotas
  • Efficient batch processing for multiple concurrent searches
  • Progress callbacks for real-time search events
  • Support for multiple languages and countries
  • Comprehensive exception hierarchy for robust error handling
  • Real-time rate limit tracking and monitoring

Installation

pip install searcherator

Requirements

Quick Start

from searcherator import Searcherator
import asyncio

async def main():
    # Basic search
    search = Searcherator("Python programming")
    
    # Get URLs from search results
    urls = await search.urls()
    print(urls)
    
    # Get detailed results
    results = await search.detailed_search_result()
    for result in results:
        print(f"{result['title']}: {result['url']}")
    
    # Clean up
    await Searcherator.close_session()

if __name__ == "__main__":
    asyncio.run(main())

Usage Examples

Basic Search

from searcherator import Searcherator
import asyncio

async def main():
    search = Searcherator("Python tutorials", num_results=10)
    results = await search.search_result()
    print(results)
    await Searcherator.close_session()

asyncio.run(main())

Localized Search

# German search
german_search = Searcherator(
    "Zusammenfassung Buch 'Demian' von 'Hermann Hesse'",
    language="de",
    country="de",
    num_results=10
)
results = await german_search.search_result()

Batch Processing

import asyncio
from searcherator import Searcherator

async def batch_search():
    queries = ["Python", "JavaScript", "Rust", "Go", "TypeScript"]
    
    try:
        # Create search instances
        searches = [Searcherator(q, num_results=5) for q in queries]
        
        # Run all searches concurrently (rate limiting handled automatically)
        results = await asyncio.gather(
            *[s.search_result() for s in searches],
            return_exceptions=True
        )
        
        # Process results
        for query, result in zip(queries, results):
            if isinstance(result, dict):
                print(f"{query}: {len(result.get('web', {}).get('results', []))} results")
    finally:
        await Searcherator.close_session()

asyncio.run(batch_search())

Error Handling

from searcherator import (
    Searcherator,
    SearcheratorAuthError,
    SearcheratorRateLimitError,
    SearcheratorTimeoutError,
    SearcheratorAPIError
)

async def safe_search():
    try:
        search = Searcherator("Python", timeout=10)
        results = await search.search_result()
    except SearcheratorAuthError:
        print("Invalid API key")
    except SearcheratorRateLimitError as e:
        print(f"Rate limited. Resets in {e.reset_per_second}s")
    except SearcheratorTimeoutError:
        print("Request timed out")
    except SearcheratorAPIError as e:
        print(f"API error: {e.status_code} - {e.message}")
    finally:
        await Searcherator.close_session()

Monitoring Rate Limits

search = Searcherator("Python")
results = await search.search_result()

print(f"Rate limit (per second): {search.rate_limit_per_second}")
print(f"Remaining (per second): {search.rate_remaining_per_second}")
print(f"Rate limit (per month): {search.rate_limit_per_month}")
print(f"Remaining (per month): {search.rate_remaining_per_month}")

Progress Callbacks

Track search progress with real-time event callbacks. Supports both sync and async callables:

import asyncio
from searcherator import Searcherator

async def main():
    # Sync callback
    search = Searcherator(
        "Python programming",
        on_progress=lambda e: print(f"{e['event']}: {e.get('cached', 'N/A')}")
    )
    await search.search_result()
    
    # Async callback
    async def on_progress(event):
        await log_to_database(event)
    
    search = Searcherator(
        "Python tutorials",
        on_progress=on_progress
    )
    await search.search_result()
    
    await Searcherator.close_session()

asyncio.run(main())

Event structure:

All events include event (str) and ts (float, Unix timestamp).

Event Additional Fields Description
search_started query Fired before search begins
search_done query, num_results, cached, cache_source Fired after search completes
error query, message Fired on exception

Cache source values:

  • "miss" — API call was made
  • "l1" — Returned from local JSON cache
  • "l2" — Returned from DynamoDB cache

API Reference

Searcherator

Searcherator(
    search_term: str = "",
    num_results: int = 5,
    country: str | None = "us",
    language: str | None = "en",
    api_key: str | None = None,
    spellcheck: bool = False,
    timeout: int = 30,
    clear_cache: bool = False,
    ttl: int = 7,
    logging: bool = False,
    dynamodb_table: str | None = None,
    on_progress: Callable | None = None
)

Parameters

  • search_term (str): The query string to search for
  • num_results (int): Maximum number of results to return (default: 5)
  • country (str): Country code for search results (default: "us")
  • language (str): Language code for search results (default: "en")
  • api_key (str): Brave Search API key (default: None, uses BRAVE_API_KEY environment variable)
  • spellcheck (bool): Enable spell checking on queries (default: False)
  • timeout (int): Request timeout in seconds (default: 30)
  • clear_cache (bool): Clear existing cached results (default: False)
  • ttl (int): Time-to-live for cached results in days (default: 7)
  • logging (bool): Enable cache operation logging (default: False)
  • dynamodb_table (str): DynamoDB table name for cross-machine cache sharing (default: None)
  • on_progress (Callable): Callback for progress events. Accepts both sync and async callables (default: None)

Methods

async search_result() -> dict

Returns the full search results as a dictionary from the Brave Search API.

async urls() -> list[str]

Returns a list of URLs from the search results.

async detailed_search_result() -> list[dict]

Returns detailed information for each search result including title, URL, description, and metadata.

async print() -> None

Pretty prints the full search results.

@classmethod async close_session()

Closes the shared aiohttp session. Call this when done with all searches.

Authentication

Set your Brave Search API key as an environment variable:

# Linux/macOS
export BRAVE_API_KEY="your-api-key-here"

# Windows
set BRAVE_API_KEY=your-api-key-here

Or provide it directly:

search = Searcherator("My search term", api_key="your-api-key-here")

Exception Hierarchy

SearcheratorError (base exception)
├── SearcheratorAuthError (authentication failures)
├── SearcheratorRateLimitError (rate limit exceeded)
├── SearcheratorTimeoutError (request timeout)
└── SearcheratorAPIError (other API errors)

Rate Limiting

Searcherator automatically handles rate limiting to respect Brave Search API quotas:

  • Automatic throttling - Requests are automatically spaced to stay within limits
  • Concurrent control - Built-in semaphore limits concurrent requests
  • Rate limit tracking - Monitor your usage via instance attributes

The default configuration safely handles up to ~13 requests per second, well under typical API limits.

Caching

Results are automatically cached to disk:

  • Location: data/search/ directory
  • Format: JSON files
  • TTL: Configurable (default: 7 days)
  • Cache key: Based on search term, language, country, and num_results

DynamoDB Backend (Optional)

Enable cross-machine cache sharing via DynamoDB:

search = Searcherator(
    "Python tutorials",
    dynamodb_table="my-search-cache"
)
results = await search.search_result()

How it works:

  • L1 (local JSON): Checked first for instant access
  • L2 (DynamoDB): Checked on L1 miss, synced across machines
  • No table specified: Works as local-only cache

Requirements:

  • Install boto3: pip install boto3
  • AWS credentials configured (environment variables, IAM role, or ~/.aws/credentials)
  • DynamoDB table auto-created if missing (requires IAM permissions)

To disable caching for a specific search:

search = Searcherator("Python", clear_cache=True, ttl=0)

Best Practices

  1. Always close the session when done:

    try:
        # Your searches
    finally:
        await Searcherator.close_session()
    
  2. Use batch processing for multiple searches:

    results = await asyncio.gather(*[s.search_result() for s in searches])
    
  3. Handle exceptions appropriately:

    try:
        results = await search.search_result()
    except SearcheratorRateLimitError:
        # Wait and retry
    
  4. Monitor rate limits for high-volume applications:

    if search.rate_remaining_per_month < 1000:
        # Alert or throttle
    

Testing

Run the test suite:

# Install test dependencies
pip install pytest pytest-asyncio

# Run all tests
pytest test_searcherator.py -v

# Run with coverage
pip install pytest-cov
pytest test_searcherator.py --cov=searcherator --cov-report=html

License

MIT License

Links

Author

Arved Klöhn - GitHub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

searcherator-0.2.0.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

searcherator-0.2.0-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file searcherator-0.2.0.tar.gz.

File metadata

  • Download URL: searcherator-0.2.0.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for searcherator-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3c2a8add4019eb032538343a47aa960c96bde5a1e5a8d8883709135d240db60c
MD5 195a0175aab2b43e3c912a3bcd7be9f9
BLAKE2b-256 0b37ea619ebd907f6c5829029e2e2514873aee334bdc9f390783d237314851cb

See more details on using hashes here.

File details

Details for the file searcherator-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: searcherator-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 9.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for searcherator-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a82d85e6881b60593a0d709bea7ee3a243c4c30f61cad1d60965253ef38dc8c9
MD5 70207f67a70892ea60e6254119e63e95
BLAKE2b-256 fdc90f8c993554833649451b47790a8739cffd6e09e1217aafb27e6f027367ec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page