Skip to main content

A module to find links through various search engines

Project description

# SERPEngine

**SERPEngine** – Production-grade search module to find links through search engines.

* uses Google Search API  
* made for production – you need API keys  
* includes various filters (including an LLM-based one) so you can filter links by domain, metadata, etc.  
* returns structured dataclasses (`SearchHit`, `SERPMethodOp`, `SerpEngineOp`)

---

## 1. Installation

```bash
pip install serpengine

2. Environment variables

Create a .env (or export vars manually):

GOOGLE_SEARCH_API_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXX
GOOGLE_CSE_ID=yyyyyyyyyyyyyyyyyyyyyyyyy:zzz

Both values are required; the engine will raise if either is missing.


3. Dataclass cheat-sheet

Class What it represents
SearchHit One URL result (link, title, metadata).
UsageInfo Billing info (currently just cost: float).
SERPMethodOp Output of one search method.
Fields: name, results: List[SearchHit], usage, elapsed_time.
SerpEngineOp Aggregated result of a full collect() call.
Fields: usage, methods: List[SERPMethodOp], results, elapsed_time.
➕ helper all_links() -> List[str]

4. Quick-start (sync)

from serpengine.serpengine import SERPEngine

engine = SERPEngine()

op = engine.collect(
    query="best pizza in Helsinki",
    num_urls=5,
    search_sources=["google_search_via_api"],   # or add "google_search_via_request_module"
    output_format="object"                      # default
)

print(op.elapsed_time, "sec")
print(op.all_links())

5. Quick-start (async)

import asyncio
from serpengine.serpengine import SERPEngine

async def main():
    eng = SERPEngine()
    op = await eng.collect_async(
        query="python tutorials",
        num_urls=6,
        search_sources=["google_search_via_api", "google_search_via_request_module"],
        output_format="object"
    )
    print(op.all_links())

asyncio.run(main())

LM filtering are applied automatically.


Getting Google Credentials

1. Create or Select a Google Cloud Project

  1. Open the Google Cloud Console.
  2. Either create a new project or select an existing one from the project picker.

2. Enable the Custom Search API

  1. In the left-hand menu, navigate to APIs & Services → Library.
  2. Search for “Custom Search API”.
  3. Click the result, then press Enable.

3. Create Credentials (API Key)

This becomes your GOOGLE_SEARCH_API_KEY.

  1. Still under APIs & Services, choose Credentials.
  2. Click Create Credentials → API key.
  3. Copy the key shown in the dialog and keep it safe.

Getting the Custom Search Engine ID (GOOGLE_CSE_ID)

1. Open Google Custom Search Engine

Go to cse.google.com/cse.

2. Create a New Search Engine

  1. Click Add (or “New Search Engine”).

  2. In “Sites to search”, you can:

    • Enter a specific domain (e.g., example.com) or
    • Use a wildcard like *.com (if you intend to search the entire web—later you can enable Search the entire web in the control panel).
  3. Give your CSE a name, then click Create.

3. Retrieve Your CSE ID

  1. Open the Control Panel for the search engine you just created.
  2. Locate the “Search engine ID” (sometimes labeled cx).
  3. Copy that string—this is your GOOGLE_CSE_ID.

Next Steps

Add both credentials to your environment, e.g. in a .env file:

GOOGLE_SEARCH_API_KEY=YOUR_API_KEY_HERE
GOOGLE_CSE_ID=YOUR_CSE_ID_HERE

Then load them in your code (the SERPEngine does this automatically with python-dotenv).


Output

  • JSON Format:

    {
        "operation_result": {
            "total_time": 1.234,
            "errors": []
        },
        "results": [
            {
                "link": "https://digikey.com/product1",
                "metadata": "",
                "title": ""
            }
        ]
    }
    
  • Filters:

    • Allowed Domains: Restricts search results to specified domains. Example: allowed_domains=["digikey.com"]
    • Keyword Match Based Link Validation: Ensures links contain certain keywords. Example: keyword_match_based_link_validation=["STM32"]
    • Allowed Countries (Optional): Filters links by TLD to include only specified countries.
    • Forbidden Countries (Optional): Excludes links from specified countries based on their TLD.
    • Additional Validation Conditions: Custom logic to further filter links.
  • Output Formats:

    • JSON: Structured dictionary with operation results and links.
    • Objects: List of LinkSearch dataclass instances for flexible manipulation.
  • Error Handling and Logging: Captures and logs errors during search and filtering for easier debugging.

  • Extensibility: Designed for easy extension—add new search sources or advanced filters as needed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

serpengine-0.1.4.tar.gz (30.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

serpengine-0.1.4-py3-none-any.whl (33.9 kB view details)

Uploaded Python 3

File details

Details for the file serpengine-0.1.4.tar.gz.

File metadata

  • Download URL: serpengine-0.1.4.tar.gz
  • Upload date:
  • Size: 30.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for serpengine-0.1.4.tar.gz
Algorithm Hash digest
SHA256 ae7bca84f2ac857f2d88a629da4ebe06b8b9a881ae955d657a44978fc73327e8
MD5 bd7c522ee58b07eee43b9308d9e4dfd4
BLAKE2b-256 0b4d9c145cc386ee26eaa0f7eeb34ec6a6140d6f0e7754916bba86fae294eb7a

See more details on using hashes here.

File details

Details for the file serpengine-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: serpengine-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 33.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for serpengine-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 c9253c1badee51b06d316d2aed7c7665be785a79cf2ea5be4c92eb466e1394c7
MD5 658b2e86db4f68be465beba8bdcfe969
BLAKE2b-256 a87be9b397b2fb826295744865b1f54cc9c160d974da1606406c7238826e1237

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page