Skip to main content

A module to find links through various search engines

Project description

# SERPEngine

**SERPEngine** – Production-grade search module to find links through search engines.

* uses Google Search API  
* made for production – you need API keys  
* includes various filters (including an LLM-based one) so you can filter links by domain, metadata, etc.  
* returns structured dataclasses (`SearchHit`, `SERPMethodOp`, `SerpEngineOp`)

---

## 1. Installation

```bash
pip install serpengine

2. Environment variables

Create a .env (or export vars manually):

GOOGLE_SEARCH_API_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXX
GOOGLE_CSE_ID=yyyyyyyyyyyyyyyyyyyyyyyyy:zzz

Both values are required; the engine will raise if either is missing.


3. Dataclass cheat-sheet

Class What it represents
SearchHit One URL result (link, title, metadata).
UsageInfo Billing info (currently just cost: float).
SERPMethodOp Output of one search method.
Fields: name, results: List[SearchHit], usage, elapsed_time.
SerpEngineOp Aggregated result of a full collect() call.
Fields: usage, methods: List[SERPMethodOp], results, elapsed_time.
➕ helper all_links() -> List[str]

4. Quick-start (sync)

from serpengine.serpengine import SERPEngine

engine = SERPEngine()

op = engine.collect(
    query="best pizza in Helsinki",
    num_urls=5,
    search_sources=["google_search_via_api"],   # or add "google_search_via_request_module"
    output_format="object"                      # default
)

print(op.elapsed_time, "sec")
print(op.all_links())

5. Quick-start (async)

import asyncio
from serpengine.serpengine import SERPEngine

async def main():
    eng = SERPEngine()
    op = await eng.collect_async(
        query="python tutorials",
        num_urls=6,
        search_sources=["google_search_via_api", "google_search_via_request_module"],
        output_format="object"
    )
    print(op.all_links())

asyncio.run(main())

LM filtering are applied automatically.


Getting Google Credentials

1. Create or Select a Google Cloud Project

  1. Open the Google Cloud Console.
  2. Either create a new project or select an existing one from the project picker.

2. Enable the Custom Search API

  1. In the left-hand menu, navigate to APIs & Services → Library.
  2. Search for “Custom Search API”.
  3. Click the result, then press Enable.

3. Create Credentials (API Key)

This becomes your GOOGLE_SEARCH_API_KEY.

  1. Still under APIs & Services, choose Credentials.
  2. Click Create Credentials → API key.
  3. Copy the key shown in the dialog and keep it safe.

Getting the Custom Search Engine ID (GOOGLE_CSE_ID)

1. Open Google Custom Search Engine

Go to cse.google.com/cse.

2. Create a New Search Engine

  1. Click Add (or “New Search Engine”).

  2. In “Sites to search”, you can:

    • Enter a specific domain (e.g., example.com) or
    • Use a wildcard like *.com (if you intend to search the entire web—later you can enable Search the entire web in the control panel).
  3. Give your CSE a name, then click Create.

3. Retrieve Your CSE ID

  1. Open the Control Panel for the search engine you just created.
  2. Locate the “Search engine ID” (sometimes labeled cx).
  3. Copy that string—this is your GOOGLE_CSE_ID.

Next Steps

Add both credentials to your environment, e.g. in a .env file:

GOOGLE_SEARCH_API_KEY=YOUR_API_KEY_HERE
GOOGLE_CSE_ID=YOUR_CSE_ID_HERE

Then load them in your code (the SERPEngine does this automatically with python-dotenv).


Output

  • JSON Format:

    {
        "operation_result": {
            "total_time": 1.234,
            "errors": []
        },
        "results": [
            {
                "link": "https://digikey.com/product1",
                "metadata": "",
                "title": ""
            }
        ]
    }
    
  • Filters:

    • Allowed Domains: Restricts search results to specified domains. Example: allowed_domains=["digikey.com"]
    • Keyword Match Based Link Validation: Ensures links contain certain keywords. Example: keyword_match_based_link_validation=["STM32"]
    • Allowed Countries (Optional): Filters links by TLD to include only specified countries.
    • Forbidden Countries (Optional): Excludes links from specified countries based on their TLD.
    • Additional Validation Conditions: Custom logic to further filter links.
  • Output Formats:

    • JSON: Structured dictionary with operation results and links.
    • Objects: List of LinkSearch dataclass instances for flexible manipulation.
  • Error Handling and Logging: Captures and logs errors during search and filtering for easier debugging.

  • Extensibility: Designed for easy extension—add new search sources or advanced filters as needed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

serpengine-0.1.4.2.tar.gz (30.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

serpengine-0.1.4.2-py3-none-any.whl (33.9 kB view details)

Uploaded Python 3

File details

Details for the file serpengine-0.1.4.2.tar.gz.

File metadata

  • Download URL: serpengine-0.1.4.2.tar.gz
  • Upload date:
  • Size: 30.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for serpengine-0.1.4.2.tar.gz
Algorithm Hash digest
SHA256 34ce9d43fea096c121dc1743a92270fe786864351839913edee9e1df58945b5a
MD5 57c764e061a5a3d9e8262c75741da8b9
BLAKE2b-256 6398431ef50adaa3324603a6de35842c905dae97b0e7770ad5828cd56578b009

See more details on using hashes here.

File details

Details for the file serpengine-0.1.4.2-py3-none-any.whl.

File metadata

  • Download URL: serpengine-0.1.4.2-py3-none-any.whl
  • Upload date:
  • Size: 33.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for serpengine-0.1.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6b652a2715259b0e3d48e4a1930312529fdd307a45b9b70a2f80fc532397dc24
MD5 88011fc01c78357e35157f84c203e942
BLAKE2b-256 06d6b431cd2842d5ab33c4a56e84b8b3c678a7d938f13181016f7e0c665c0873

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page