A module to find links through various search engines
Project description
SERPEngine
SERPEngine Production grade search module to find links through search websites.
- uses google search API
- made for production. You need API keys
- includes various filters including LLM based one. So you can filter the links based on domain, metadata,
Installation
-
Clone the Repository:
pip install serpengine [repo]
-
Activate Your Project Environment:
Ensure you have your Python environment activated (e.g., using
venv,virtualenv, orconda). -
Navigate to the Link Search Agent Folder and Install:
cd link_search_agent pip install -e .
This installs the package in editable mode, allowing you to make changes to the source code that are immediately reflected.
Usage
Using the Link Search Agent is straightforward. Simply initialize the RelevantLinkSearcher and call the collect method with your desired query and parameters.
Example
from relevant_link_searcher import RelevantLinkSearcher
# Initialize the searcher
link_searcher = LinkSearchAgent()
# Collect links based on a query
result_data = link_searcher.collect(
query="STM32 Microprocessor",
num_urls=5,
search_sources=["google_search_via_api", "google_search_via_request_module"],
keyword_match_based_link_validation=["STM32"],
allowed_domains=["digikey.com"],
output_format="json" # or "linksearch"
)
print(result_data)
Parameters
query(str): The search query.validation_conditions(Dict, optional): Additional validation rules for filtering links.num_urls(int): Number of links to retrieve.search_sources(List[str]): Search sources to use (e.g.,"google_search_via_api","google_search_via_request_module").allowed_countries(List[str], optional): List of country codes to allow.forbidden_countries(List[str], optional): List of country codes to forbid.allowed_domains(List[str], optional): List of domains to allow.forbidden_domains(List[str], optional): List of domains to block.filter_llm(bool, optional): Whether to use AI-based filtering.output_format(str): Output format, either"json"or"linksearch".
Output
-
JSON Format:
{ "operation_result": { "total_time": 1.234, "errors": [] }, "results": [ { "link": "https://digikey.com/product1", "metadata": "", "title": "" }, ... ] }
-
LinkSearch Objects:
A list of
LinkSearchobjects with attributeslink,metadata, andtitle.
Features
-
Search Modules:
- Simple Google Search Module: Scrapes Google search results directly from the HTML.
- Google Search API Module: Utilizes the Google Custom Search API for fetching search results.
-
Filters:
-
Allowed Domains:
- Description: Restricts search results to specified domains. For example, setting
allowed_domains=["digikey.com"]ensures only links from Digi-Key are collected.
- Description: Restricts search results to specified domains. For example, setting
-
Keyword Match Based Link Validation:
- Description: Ensures that the collected links contain specific keywords. For instance,
keyword_match_based_link_validation=["STM32"]filters out any links that do not include the keyword "STM32".
- Description: Ensures that the collected links contain specific keywords. For instance,
-
Allowed Countries (Optional):
- Description: Filters links based on the top-level domain (TLD) to include only those from specified countries.
-
Forbidden Countries (Optional):
- Description: Excludes links from specified countries based on their TLD.
-
Additional Validation Conditions:
- Description: Allows for custom validation logic to further filter links based on user-defined criteria.
-
-
Output Formats:
- JSON: Provides a structured dictionary containing operation results and the list of collected links.
- LinkSearch Objects: Returns a list of
LinkSearchdataclass instances for flexible manipulation within Python.
-
Error Handling and Logging:
- Captures and logs errors encountered during the search and filtering processes, facilitating easier debugging and maintenance.
-
Extensibility:
- Designed to be easily extendable, allowing integration of additional search sources or more sophisticated filtering mechanisms as needed.
Requirements
Ensure you have the following dependencies installed. They are listed in the requirements.txt file:
requests>=2.25.1
python-dotenv>=0.19.0
beautifulsoup4>=4.9.3
You can install them via:
pip install -r requirements.txt
Configuration
Before using the Link Search Agent, set up your environment variables:
-
Create a
.envFile:GOOGLE_API_KEY=your_google_api_key GOOGLE_CSE_ID=your_custom_search_engine_id
-
Ensure the
.envFile is in the Root Directory or the Directory Where the Script Runs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file serpengine-0.0.2.tar.gz.
File metadata
- Download URL: serpengine-0.0.2.tar.gz
- Upload date:
- Size: 14.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26928f0146ae299c9c67df1fd0095c0c1e2d90b2be9b32c1bea6bc9a8bf32085
|
|
| MD5 |
bf8569ca832d94a6df6bfa3798cee597
|
|
| BLAKE2b-256 |
2f6851fa93a2ef38f11a44c5861188697d30020d1c32e33fb5a0d8f87400fb99
|
File details
Details for the file serpengine-0.0.2-py3-none-any.whl.
File metadata
- Download URL: serpengine-0.0.2-py3-none-any.whl
- Upload date:
- Size: 14.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7c9ba995845a59da0243ea5e84317057127080c1aa9cd8055bf4e4ef2a76bca
|
|
| MD5 |
a1c653a91d8fe38af19910cabd1e7d16
|
|
| BLAKE2b-256 |
22a5f8f889ab9580149c1cfc51bd2c5a4676b081070d03b7f7f914a0e08808a2
|