Skip to main content

A module to find links through various search engines

Project description

# SERPEngine

**SERPEngine** – Production-grade search module to find links through search engines.

* uses Google Search API  
* made for production – you need API keys  
* includes various filters (including an LLM-based one) so you can filter links by domain, metadata, etc.  
* returns structured dataclasses (`SearchHit`, `SERPMethodOp`, `SerpEngineOp`)

---

## 1. Installation

```bash
pip install serpengine

2. Environment variables

Create a .env (or export vars manually):

GOOGLE_SEARCH_API_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXX
GOOGLE_CSE_ID=yyyyyyyyyyyyyyyyyyyyyyyyy:zzz

Both values are required; the engine will raise if either is missing.


3. Dataclass cheat-sheet

Class What it represents
SearchHit One URL result (link, title, metadata).
UsageInfo Billing info (currently just cost: float).
SERPMethodOp Output of one search method.
Fields: name, results: List[SearchHit], usage, elapsed_time.
SerpEngineOp Aggregated result of a full collect() call.
Fields: usage, methods: List[SERPMethodOp], results, elapsed_time.
➕ helper all_links() -> List[str]

4. Quick-start (sync)

from serpengine.serpengine import SERPEngine

engine = SERPEngine()

op = engine.collect(
    query="best pizza in Helsinki",
    num_urls=5,
    search_sources=["google_search_via_api"],   # or add "google_search_via_request_module"
    output_format="object"                      # default
)

print(op.elapsed_time, "sec")
print(op.all_links())

5. Quick-start (async)

import asyncio
from serpengine.serpengine import SERPEngine

async def main():
    eng = SERPEngine()
    op = await eng.collect_async(
        query="python tutorials",
        num_urls=6,
        search_sources=["google_search_via_api", "google_search_via_request_module"],
        output_format="object"
    )
    print(op.all_links())

asyncio.run(main())

LM filtering are applied automatically.


Getting Google Credentials

1. Create or Select a Google Cloud Project

  1. Open the Google Cloud Console.
  2. Either create a new project or select an existing one from the project picker.

2. Enable the Custom Search API

  1. In the left-hand menu, navigate to APIs & Services → Library.
  2. Search for “Custom Search API”.
  3. Click the result, then press Enable.

3. Create Credentials (API Key)

This becomes your GOOGLE_SEARCH_API_KEY.

  1. Still under APIs & Services, choose Credentials.
  2. Click Create Credentials → API key.
  3. Copy the key shown in the dialog and keep it safe.

Getting the Custom Search Engine ID (GOOGLE_CSE_ID)

1. Open Google Custom Search Engine

Go to cse.google.com/cse.

2. Create a New Search Engine

  1. Click Add (or “New Search Engine”).

  2. In “Sites to search”, you can:

    • Enter a specific domain (e.g., example.com) or
    • Use a wildcard like *.com (if you intend to search the entire web—later you can enable Search the entire web in the control panel).
  3. Give your CSE a name, then click Create.

3. Retrieve Your CSE ID

  1. Open the Control Panel for the search engine you just created.
  2. Locate the “Search engine ID” (sometimes labeled cx).
  3. Copy that string—this is your GOOGLE_CSE_ID.

Next Steps

Add both credentials to your environment, e.g. in a .env file:

GOOGLE_SEARCH_API_KEY=YOUR_API_KEY_HERE
GOOGLE_CSE_ID=YOUR_CSE_ID_HERE

Then load them in your code (the SERPEngine does this automatically with python-dotenv).


Output

  • JSON Format:

    {
        "operation_result": {
            "total_time": 1.234,
            "errors": []
        },
        "results": [
            {
                "link": "https://digikey.com/product1",
                "metadata": "",
                "title": ""
            }
        ]
    }
    
  • Filters:

    • Allowed Domains: Restricts search results to specified domains. Example: allowed_domains=["digikey.com"]
    • Keyword Match Based Link Validation: Ensures links contain certain keywords. Example: keyword_match_based_link_validation=["STM32"]
    • Allowed Countries (Optional): Filters links by TLD to include only specified countries.
    • Forbidden Countries (Optional): Excludes links from specified countries based on their TLD.
    • Additional Validation Conditions: Custom logic to further filter links.
  • Output Formats:

    • JSON: Structured dictionary with operation results and links.
    • Objects: List of LinkSearch dataclass instances for flexible manipulation.
  • Error Handling and Logging: Captures and logs errors during search and filtering for easier debugging.

  • Extensibility: Designed for easy extension—add new search sources or advanced filters as needed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

serpengine-0.1.0.tar.gz (24.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

serpengine-0.1.0-py3-none-any.whl (26.8 kB view details)

Uploaded Python 3

File details

Details for the file serpengine-0.1.0.tar.gz.

File metadata

  • Download URL: serpengine-0.1.0.tar.gz
  • Upload date:
  • Size: 24.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for serpengine-0.1.0.tar.gz
Algorithm Hash digest
SHA256 26498b8ff92a051ab05655ca87e1d635358976d7eb28d205fb5f9063f8b24bcc
MD5 519cdc40c02bc90fb0ba9d624edf1aed
BLAKE2b-256 489dce227aa300825bb8d73e383a3c79321144f5af0e047caa7ef3e0f0a1b017

See more details on using hashes here.

File details

Details for the file serpengine-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: serpengine-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 26.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for serpengine-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3bf0253c7c73b2c930174be0902b584cc69e6e168644b349f43c3fda84a61023
MD5 85876be55d0e0e053d907ed1187d7d60
BLAKE2b-256 7051a8110b8fd1ec91152441b0ced543a2fea785ab660064a521721d36d43c0b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page