Skip to main content

A module to find links through various search engines

Project description

# SERPEngine

**SERPEngine** – Production-grade search module to find links through search engines.

* uses Google Search API  
* made for production – you need API keys  
* includes various filters (including an LLM-based one) so you can filter links by domain, metadata, etc.  
* returns structured dataclasses (`SearchHit`, `SERPMethodOp`, `SerpEngineOp`)

---

## 1. Installation

```bash
pip install serpengine

2. Environment variables

Create a .env (or export vars manually):

GOOGLE_SEARCH_API_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXX
GOOGLE_CSE_ID=yyyyyyyyyyyyyyyyyyyyyyyyy:zzz

Both values are required; the engine will raise if either is missing.


3. Dataclass cheat-sheet

Class What it represents
SearchHit One URL result (link, title, metadata).
UsageInfo Billing info (currently just cost: float).
SERPMethodOp Output of one search method.
Fields: name, results: List[SearchHit], usage, elapsed_time.
SerpEngineOp Aggregated result of a full collect() call.
Fields: usage, methods: List[SERPMethodOp], results, elapsed_time.
➕ helper all_links() -> List[str]

4. Quick-start (sync)

from serpengine.serpengine import SERPEngine

engine = SERPEngine()

op = engine.collect(
    query="best pizza in Helsinki",
    num_urls=5,
    search_sources=["google_search_via_api"],   # or add "google_search_via_request_module"
    output_format="object"                      # default
)

print(op.elapsed_time, "sec")
print(op.all_links())

5. Quick-start (async)

import asyncio
from serpengine.serpengine import SERPEngine

async def main():
    eng = SERPEngine()
    op = await eng.collect_async(
        query="python tutorials",
        num_urls=6,
        search_sources=["google_search_via_api", "google_search_via_request_module"],
        output_format="object"
    )
    print(op.all_links())

asyncio.run(main())

LM filtering are applied automatically.


Getting Google Credentials

1. Create or Select a Google Cloud Project

  1. Open the Google Cloud Console.
  2. Either create a new project or select an existing one from the project picker.

2. Enable the Custom Search API

  1. In the left-hand menu, navigate to APIs & Services → Library.
  2. Search for “Custom Search API”.
  3. Click the result, then press Enable.

3. Create Credentials (API Key)

This becomes your GOOGLE_SEARCH_API_KEY.

  1. Still under APIs & Services, choose Credentials.
  2. Click Create Credentials → API key.
  3. Copy the key shown in the dialog and keep it safe.

Getting the Custom Search Engine ID (GOOGLE_CSE_ID)

1. Open Google Custom Search Engine

Go to cse.google.com/cse.

2. Create a New Search Engine

  1. Click Add (or “New Search Engine”).

  2. In “Sites to search”, you can:

    • Enter a specific domain (e.g., example.com) or
    • Use a wildcard like *.com (if you intend to search the entire web—later you can enable Search the entire web in the control panel).
  3. Give your CSE a name, then click Create.

3. Retrieve Your CSE ID

  1. Open the Control Panel for the search engine you just created.
  2. Locate the “Search engine ID” (sometimes labeled cx).
  3. Copy that string—this is your GOOGLE_CSE_ID.

Next Steps

Add both credentials to your environment, e.g. in a .env file:

GOOGLE_SEARCH_API_KEY=YOUR_API_KEY_HERE
GOOGLE_CSE_ID=YOUR_CSE_ID_HERE

Then load them in your code (the SERPEngine does this automatically with python-dotenv).


Output

  • JSON Format:

    {
        "operation_result": {
            "total_time": 1.234,
            "errors": []
        },
        "results": [
            {
                "link": "https://digikey.com/product1",
                "metadata": "",
                "title": ""
            }
        ]
    }
    
  • Filters:

    • Allowed Domains: Restricts search results to specified domains. Example: allowed_domains=["digikey.com"]
    • Keyword Match Based Link Validation: Ensures links contain certain keywords. Example: keyword_match_based_link_validation=["STM32"]
    • Allowed Countries (Optional): Filters links by TLD to include only specified countries.
    • Forbidden Countries (Optional): Excludes links from specified countries based on their TLD.
    • Additional Validation Conditions: Custom logic to further filter links.
  • Output Formats:

    • JSON: Structured dictionary with operation results and links.
    • Objects: List of LinkSearch dataclass instances for flexible manipulation.
  • Error Handling and Logging: Captures and logs errors during search and filtering for easier debugging.

  • Extensibility: Designed for easy extension—add new search sources or advanced filters as needed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

serpengine-0.0.9.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

serpengine-0.0.9-py3-none-any.whl (14.8 kB view details)

Uploaded Python 3

File details

Details for the file serpengine-0.0.9.tar.gz.

File metadata

  • Download URL: serpengine-0.0.9.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for serpengine-0.0.9.tar.gz
Algorithm Hash digest
SHA256 3dfea2a211d3a10c6b810ffcfc3e93a66705b5e671f06f41943dd7990be0f629
MD5 13ee1a961165a794546f35237cda1775
BLAKE2b-256 8e913b5f1aed8bdfd842cc14f401a46de5325653ab1bf85a980c2a3412b6d8d1

See more details on using hashes here.

File details

Details for the file serpengine-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: serpengine-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 14.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for serpengine-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 9cdb5a1b8a20cf5d4aff69e8637d67e7e3697481866524fad2a421606f85bea9
MD5 c6e06e001b92f9510a8334cc9c0b98a0
BLAKE2b-256 5588ec580624fbe1009c4c343bbbf142ed3f930f31723a978b961f1690d83fef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page