A module to find links through various search engines
Project description
# SERPEngine
**SERPEngine** – Production-grade search module to find links through search engines.
* uses Google Search API
* made for production – you need API keys
* includes various filters (including an LLM-based one) so you can filter links by domain, metadata, etc.
* returns structured dataclasses (`SearchHit`, `SERPMethodOp`, `SerpEngineOp`)
---
## 1. Installation
```bash
pip install serpengine
2. Environment variables
Create a .env (or export vars manually):
GOOGLE_SEARCH_API_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXX
GOOGLE_CSE_ID=yyyyyyyyyyyyyyyyyyyyyyyyy:zzz
Both values are required; the engine will raise if either is missing.
3. Dataclass cheat-sheet
| Class | What it represents |
|---|---|
SearchHit |
One URL result (link, title, metadata). |
UsageInfo |
Billing info (currently just cost: float). |
SERPMethodOp |
Output of one search method. Fields: name, results: List[SearchHit], usage, elapsed_time. |
SerpEngineOp |
Aggregated result of a full collect() call.Fields: usage, methods: List[SERPMethodOp], results, elapsed_time.➕ helper all_links() -> List[str] |
4. Quick-start (sync)
from serpengine.serpengine import SERPEngine
engine = SERPEngine()
op = engine.collect(
query="best pizza in Helsinki",
num_urls=5,
search_sources=["google_search_via_api"], # or add "google_search_via_request_module"
output_format="object" # default
)
print(op.elapsed_time, "sec")
print(op.all_links())
5. Quick-start (async)
import asyncio
from serpengine.serpengine import SERPEngine
async def main():
eng = SERPEngine()
op = await eng.collect_async(
query="python tutorials",
num_urls=6,
search_sources=["google_search_via_api", "google_search_via_request_module"],
output_format="object"
)
print(op.all_links())
asyncio.run(main())
LM filtering are applied automatically.
Getting Google Credentials
1. Create or Select a Google Cloud Project
- Open the Google Cloud Console.
- Either create a new project or select an existing one from the project picker.
2. Enable the Custom Search API
- In the left-hand menu, navigate to APIs & Services → Library.
- Search for “Custom Search API”.
- Click the result, then press Enable.
3. Create Credentials (API Key)
This becomes your
GOOGLE_SEARCH_API_KEY.
- Still under APIs & Services, choose Credentials.
- Click Create Credentials → API key.
- Copy the key shown in the dialog and keep it safe.
Getting the Custom Search Engine ID (GOOGLE_CSE_ID)
1. Open Google Custom Search Engine
Go to cse.google.com/cse.
2. Create a New Search Engine
-
Click Add (or “New Search Engine”).
-
In “Sites to search”, you can:
- Enter a specific domain (e.g.,
example.com) or - Use a wildcard like
*.com(if you intend to search the entire web—later you can enable Search the entire web in the control panel).
- Enter a specific domain (e.g.,
-
Give your CSE a name, then click Create.
3. Retrieve Your CSE ID
- Open the Control Panel for the search engine you just created.
- Locate the “Search engine ID” (sometimes labeled cx).
- Copy that string—this is your
GOOGLE_CSE_ID.
Next Steps
Add both credentials to your environment, e.g. in a .env file:
GOOGLE_SEARCH_API_KEY=YOUR_API_KEY_HERE
GOOGLE_CSE_ID=YOUR_CSE_ID_HERE
Then load them in your code (the SERPEngine does this automatically with python-dotenv).
Output
-
JSON Format:
{ "operation_result": { "total_time": 1.234, "errors": [] }, "results": [ { "link": "https://digikey.com/product1", "metadata": "", "title": "" } ] }
-
Filters:
- Allowed Domains: Restricts search results to specified domains.
Example:
allowed_domains=["digikey.com"] - Keyword Match Based Link Validation: Ensures links contain certain keywords.
Example:
keyword_match_based_link_validation=["STM32"] - Allowed Countries (Optional): Filters links by TLD to include only specified countries.
- Forbidden Countries (Optional): Excludes links from specified countries based on their TLD.
- Additional Validation Conditions: Custom logic to further filter links.
- Allowed Domains: Restricts search results to specified domains.
Example:
-
Output Formats:
- JSON: Structured dictionary with operation results and links.
- Objects: List of
LinkSearchdataclass instances for flexible manipulation.
-
Error Handling and Logging: Captures and logs errors during search and filtering for easier debugging.
-
Extensibility: Designed for easy extension—add new search sources or advanced filters as needed.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file serpengine-0.1.4.2.tar.gz.
File metadata
- Download URL: serpengine-0.1.4.2.tar.gz
- Upload date:
- Size: 30.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
34ce9d43fea096c121dc1743a92270fe786864351839913edee9e1df58945b5a
|
|
| MD5 |
57c764e061a5a3d9e8262c75741da8b9
|
|
| BLAKE2b-256 |
6398431ef50adaa3324603a6de35842c905dae97b0e7770ad5828cd56578b009
|
File details
Details for the file serpengine-0.1.4.2-py3-none-any.whl.
File metadata
- Download URL: serpengine-0.1.4.2-py3-none-any.whl
- Upload date:
- Size: 33.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b652a2715259b0e3d48e4a1930312529fdd307a45b9b70a2f80fc532397dc24
|
|
| MD5 |
88011fc01c78357e35157f84c203e942
|
|
| BLAKE2b-256 |
06d6b431cd2842d5ab33c4a56e84b8b3c678a7d938f13181016f7e0c665c0873
|