Skip to main content

A module to find links through various search engines

Project description

# SERPEngine

**SERPEngine** – Production-grade search module to find links through search engines.

* uses Google Search API  
* made for production – you need API keys  
* includes various filters (including an LLM-based one) so you can filter links by domain, metadata, etc.  
* returns structured dataclasses (`SearchHit`, `SERPMethodOp`, `SerpEngineOp`)

---

## 1. Installation

```bash
pip install serpengine

2. Environment variables

Create a .env (or export vars manually):

GOOGLE_SEARCH_API_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXX
GOOGLE_CSE_ID=yyyyyyyyyyyyyyyyyyyyyyyyy:zzz

Both values are required; the engine will raise if either is missing.


3. Dataclass cheat-sheet

Class What it represents
SearchHit One URL result (link, title, metadata).
UsageInfo Billing info (currently just cost: float).
SERPMethodOp Output of one search method.
Fields: name, results: List[SearchHit], usage, elapsed_time.
SerpEngineOp Aggregated result of a full collect() call.
Fields: usage, methods: List[SERPMethodOp], results, elapsed_time.
➕ helper all_links() -> List[str]

4. Quick-start (sync)

from serpengine.serpengine import SERPEngine

engine = SERPEngine()

op = engine.collect(
    query="best pizza in Helsinki",
    num_urls=5,
    search_sources=["google_search_via_api"],   # or add "google_search_via_request_module"
    output_format="object"                      # default
)

print(op.elapsed_time, "sec")
print(op.all_links())

5. Quick-start (async)

import asyncio
from serpengine.serpengine import SERPEngine

async def main():
    eng = SERPEngine()
    op = await eng.collect_async(
        query="python tutorials",
        num_urls=6,
        search_sources=["google_search_via_api", "google_search_via_request_module"],
        output_format="object"
    )
    print(op.all_links())

asyncio.run(main())

LM filtering are applied automatically.


Getting Google Credentials

1. Create or Select a Google Cloud Project

  1. Open the Google Cloud Console.
  2. Either create a new project or select an existing one from the project picker.

2. Enable the Custom Search API

  1. In the left-hand menu, navigate to APIs & Services → Library.
  2. Search for “Custom Search API”.
  3. Click the result, then press Enable.

3. Create Credentials (API Key)

This becomes your GOOGLE_SEARCH_API_KEY.

  1. Still under APIs & Services, choose Credentials.
  2. Click Create Credentials → API key.
  3. Copy the key shown in the dialog and keep it safe.

Getting the Custom Search Engine ID (GOOGLE_CSE_ID)

1. Open Google Custom Search Engine

Go to cse.google.com/cse.

2. Create a New Search Engine

  1. Click Add (or “New Search Engine”).

  2. In “Sites to search”, you can:

    • Enter a specific domain (e.g., example.com) or
    • Use a wildcard like *.com (if you intend to search the entire web—later you can enable Search the entire web in the control panel).
  3. Give your CSE a name, then click Create.

3. Retrieve Your CSE ID

  1. Open the Control Panel for the search engine you just created.
  2. Locate the “Search engine ID” (sometimes labeled cx).
  3. Copy that string—this is your GOOGLE_CSE_ID.

Next Steps

Add both credentials to your environment, e.g. in a .env file:

GOOGLE_SEARCH_API_KEY=YOUR_API_KEY_HERE
GOOGLE_CSE_ID=YOUR_CSE_ID_HERE

Then load them in your code (the SERPEngine does this automatically with python-dotenv).


Output

  • JSON Format:

    {
        "operation_result": {
            "total_time": 1.234,
            "errors": []
        },
        "results": [
            {
                "link": "https://digikey.com/product1",
                "metadata": "",
                "title": ""
            }
        ]
    }
    
  • Filters:

    • Allowed Domains: Restricts search results to specified domains. Example: allowed_domains=["digikey.com"]
    • Keyword Match Based Link Validation: Ensures links contain certain keywords. Example: keyword_match_based_link_validation=["STM32"]
    • Allowed Countries (Optional): Filters links by TLD to include only specified countries.
    • Forbidden Countries (Optional): Excludes links from specified countries based on their TLD.
    • Additional Validation Conditions: Custom logic to further filter links.
  • Output Formats:

    • JSON: Structured dictionary with operation results and links.
    • Objects: List of LinkSearch dataclass instances for flexible manipulation.
  • Error Handling and Logging: Captures and logs errors during search and filtering for easier debugging.

  • Extensibility: Designed for easy extension—add new search sources or advanced filters as needed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

serpengine-0.1.2.tar.gz (31.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

serpengine-0.1.2-py3-none-any.whl (35.0 kB view details)

Uploaded Python 3

File details

Details for the file serpengine-0.1.2.tar.gz.

File metadata

  • Download URL: serpengine-0.1.2.tar.gz
  • Upload date:
  • Size: 31.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for serpengine-0.1.2.tar.gz
Algorithm Hash digest
SHA256 bfd0454784a23f6e2ffd7c64e5df7e8c95d4be7060bfd7e996e4a773d8bdba13
MD5 1fb18a3bbcaa7215c3b525cb382ea085
BLAKE2b-256 dea06a0ac7768a0611a40753aab59dfffd04b990717ab97ab157cfa3cecc810d

See more details on using hashes here.

File details

Details for the file serpengine-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: serpengine-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 35.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for serpengine-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0fb631f8dbb86ab61bfdd578c9db36c1fd736c4f45c93b54dfa8f02cbc11e639
MD5 bdd4ad9bcc5250811081fc1e16343772
BLAKE2b-256 dd11d9d0dd96729026b1d33423d92c2668da4ae92d898c123ee73b6de656c79e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page