Skip to main content

Async web search client using Brave Search API with built-in caching, rate limiting, and batch processing

Project description

Searcherator

Searcherator is a Python package that provides a convenient way to perform web searches using the Brave Search API with built-in caching, automatic rate limiting, and efficient batch processing capabilities.

Features

  • Async/await support for modern Python applications
  • Automatic caching with configurable TTL
  • Optional DynamoDB backend for cross-machine cache sharing
  • Built-in rate limiting to respect API quotas
  • Efficient batch processing for multiple concurrent searches
  • Support for multiple languages and countries
  • Comprehensive exception hierarchy for robust error handling
  • Real-time rate limit tracking and monitoring

Installation

pip install searcherator

Requirements

Quick Start

from searcherator import Searcherator
import asyncio

async def main():
    # Basic search
    search = Searcherator("Python programming")
    
    # Get URLs from search results
    urls = await search.urls()
    print(urls)
    
    # Get detailed results
    results = await search.detailed_search_result()
    for result in results:
        print(f"{result['title']}: {result['url']}")
    
    # Clean up
    await Searcherator.close_session()

if __name__ == "__main__":
    asyncio.run(main())

Usage Examples

Basic Search

from searcherator import Searcherator
import asyncio

async def main():
    search = Searcherator("Python tutorials", num_results=10)
    results = await search.search_result()
    print(results)
    await Searcherator.close_session()

asyncio.run(main())

Localized Search

# German search
german_search = Searcherator(
    "Zusammenfassung Buch 'Demian' von 'Hermann Hesse'",
    language="de",
    country="de",
    num_results=10
)
results = await german_search.search_result()

Batch Processing

import asyncio
from searcherator import Searcherator

async def batch_search():
    queries = ["Python", "JavaScript", "Rust", "Go", "TypeScript"]
    
    try:
        # Create search instances
        searches = [Searcherator(q, num_results=5) for q in queries]
        
        # Run all searches concurrently (rate limiting handled automatically)
        results = await asyncio.gather(
            *[s.search_result() for s in searches],
            return_exceptions=True
        )
        
        # Process results
        for query, result in zip(queries, results):
            if isinstance(result, dict):
                print(f"{query}: {len(result.get('web', {}).get('results', []))} results")
    finally:
        await Searcherator.close_session()

asyncio.run(batch_search())

Error Handling

from searcherator import (
    Searcherator,
    SearcheratorAuthError,
    SearcheratorRateLimitError,
    SearcheratorTimeoutError,
    SearcheratorAPIError
)

async def safe_search():
    try:
        search = Searcherator("Python", timeout=10)
        results = await search.search_result()
    except SearcheratorAuthError:
        print("Invalid API key")
    except SearcheratorRateLimitError as e:
        print(f"Rate limited. Resets in {e.reset_per_second}s")
    except SearcheratorTimeoutError:
        print("Request timed out")
    except SearcheratorAPIError as e:
        print(f"API error: {e.status_code} - {e.message}")
    finally:
        await Searcherator.close_session()

Monitoring Rate Limits

search = Searcherator("Python")
results = await search.search_result()

print(f"Rate limit (per second): {search.rate_limit_per_second}")
print(f"Remaining (per second): {search.rate_remaining_per_second}")
print(f"Rate limit (per month): {search.rate_limit_per_month}")
print(f"Remaining (per month): {search.rate_remaining_per_month}")

API Reference

Searcherator

Searcherator(
    search_term: str = "",
    num_results: int = 5,
    country: str | None = "us",
    language: str | None = "en",
    api_key: str | None = None,
    spellcheck: bool = False,
    timeout: int = 30,
    clear_cache: bool = False,
    ttl: int = 7,
    logging: bool = False,
    dynamodb_table: str | None = None
)

Parameters

  • search_term (str): The query string to search for
  • num_results (int): Maximum number of results to return (default: 5)
  • country (str): Country code for search results (default: "us")
  • language (str): Language code for search results (default: "en")
  • api_key (str): Brave Search API key (default: None, uses BRAVE_API_KEY environment variable)
  • spellcheck (bool): Enable spell checking on queries (default: False)
  • timeout (int): Request timeout in seconds (default: 30)
  • clear_cache (bool): Clear existing cached results (default: False)
  • ttl (int): Time-to-live for cached results in days (default: 7)
  • logging (bool): Enable cache operation logging (default: False)
  • dynamodb_table (str): DynamoDB table name for cross-machine cache sharing (default: None)

Methods

async search_result() -> dict

Returns the full search results as a dictionary from the Brave Search API.

async urls() -> list[str]

Returns a list of URLs from the search results.

async detailed_search_result() -> list[dict]

Returns detailed information for each search result including title, URL, description, and metadata.

async print() -> None

Pretty prints the full search results.

@classmethod async close_session()

Closes the shared aiohttp session. Call this when done with all searches.

Authentication

Set your Brave Search API key as an environment variable:

# Linux/macOS
export BRAVE_API_KEY="your-api-key-here"

# Windows
set BRAVE_API_KEY=your-api-key-here

Or provide it directly:

search = Searcherator("My search term", api_key="your-api-key-here")

Exception Hierarchy

SearcheratorError (base exception)
├── SearcheratorAuthError (authentication failures)
├── SearcheratorRateLimitError (rate limit exceeded)
├── SearcheratorTimeoutError (request timeout)
└── SearcheratorAPIError (other API errors)

Rate Limiting

Searcherator automatically handles rate limiting to respect Brave Search API quotas:

  • Automatic throttling - Requests are automatically spaced to stay within limits
  • Concurrent control - Built-in semaphore limits concurrent requests
  • Rate limit tracking - Monitor your usage via instance attributes

The default configuration safely handles up to ~13 requests per second, well under typical API limits.

Caching

Results are automatically cached to disk:

  • Location: data/search/ directory
  • Format: JSON files
  • TTL: Configurable (default: 7 days)
  • Cache key: Based on search term, language, country, and num_results

DynamoDB Backend (Optional)

Enable cross-machine cache sharing via DynamoDB:

search = Searcherator(
    "Python tutorials",
    dynamodb_table="my-search-cache"
)
results = await search.search_result()

How it works:

  • L1 (local JSON): Checked first for instant access
  • L2 (DynamoDB): Checked on L1 miss, synced across machines
  • No table specified: Works as local-only cache

Requirements:

  • Install boto3: pip install boto3
  • AWS credentials configured (environment variables, IAM role, or ~/.aws/credentials)
  • DynamoDB table auto-created if missing (requires IAM permissions)

To disable caching for a specific search:

search = Searcherator("Python", clear_cache=True, ttl=0)

Best Practices

  1. Always close the session when done:

    try:
        # Your searches
    finally:
        await Searcherator.close_session()
    
  2. Use batch processing for multiple searches:

    results = await asyncio.gather(*[s.search_result() for s in searches])
    
  3. Handle exceptions appropriately:

    try:
        results = await search.search_result()
    except SearcheratorRateLimitError:
        # Wait and retry
    
  4. Monitor rate limits for high-volume applications:

    if search.rate_remaining_per_month < 1000:
        # Alert or throttle
    

Testing

Run the test suite:

# Install test dependencies
pip install pytest pytest-asyncio

# Run all tests
pytest test_searcherator.py -v

# Run with coverage
pip install pytest-cov
pytest test_searcherator.py --cov=searcherator --cov-report=html

License

MIT License

Links

Author

Arved Klöhn - GitHub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

searcherator-0.1.2.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

searcherator-0.1.2-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file searcherator-0.1.2.tar.gz.

File metadata

  • Download URL: searcherator-0.1.2.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for searcherator-0.1.2.tar.gz
Algorithm Hash digest
SHA256 69689d51a77d801c8cc1f45fb92b90ec6a89c8a3b19820c41666b71db91d1925
MD5 2956b3bd820e1c44481216a49513f83f
BLAKE2b-256 80e5958ae1e2cd22889cacca7c2a6050bd51037b94e61d0d30735a6e5b19b77e

See more details on using hashes here.

File details

Details for the file searcherator-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: searcherator-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 8.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for searcherator-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0d5e9b6eff920b1c54eda6e78426949388a9b208786d1593a6a5814db0943f5d
MD5 87d9d5fb45f03f26a5c11b7be9c173b6
BLAKE2b-256 08c0b359b9ff399e0223ddf21caa105b6bd8c986891be72cc7f793dbb45483b9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page