Not another Google searching tool.
Project description
Not Another Google Search - Playwright
Not another Google searching library. Just kidding - it is.
Made for educational purposes. I hope it will help!
Table of Contents
How to Install
Install Playwright and Chromium
pip3 install --upgrade playwright
playwright install chromium
Make sure each time you upgrade your Playwright dependency to re-install Chromium; otherwise, you might get an error using the headless browser.
Standard Install
pip3 install nagooglesearch-playwright
pip3 install --upgrade nagooglesearch-playwright
Build and Install From the Source
git clone https://github.com/ivan-sincek/nagooglesearch-playwright && cd nagooglesearch-playwright
python3 -m pip install --upgrade build
python3 -m build
python3 -m pip install dist/nagooglesearch_playwright-1.2-py3-none-any.whl
Usage
Standard
Default values:
nagooglesearch_playwright.GoogleClient(
tld = "com",
homepage_parameters = {
"btnK": "Google+Search",
"source": "hp"
},
search_parameters = {
},
cookies = {
},
user_agent = "",
proxy = "",
max_results = 100,
min_sleep = 8,
max_sleep = 18,
consent_selector = "xpath=//img[@alt='Google']/../../following-sibling::div[2]/div/button[1]",
headless = True,
humanize = False,
debug = False
)
Only domains without they keyword google and not ending with the keyword goo.gl are accepted as valid results. The final output is a unique and sorted list of URLs.
Example, standard:
import nagooglesearch_playwright, asyncio
# the following query string parameters are set only if 'start' query string parameter is not set or is equal to zero
# simulate a homepage search
homepage_parameters = {
"btnK": "Google+Search",
"source": "hp"
}
# search the internet for additional query string parameters
# https://brightdata.com/blog/web-data/google-search-url-parameters
search_parameters = {
"q": "site:*.example.com intext:password", # search query
"tbs": "li:1", # specify 'li:1' for verbatim search (no alternate spellings, etc.)
"hl": "en",
"lr": "lang_en",
"cr": "countryUS",
"udm": "14", # only web results
"filter": "0", # specify '0' to display hidden results
"safe": "images" # specify 'images' to turn off safe search, or specify 'active' to turn on safe search
}
# specify custom cookies here
cookies = {
}
client = nagooglesearch_playwright.GoogleClient(
tld = "com", # top level domain, e.g., www.google.com or www.google.hr
homepage_parameters = homepage_parameters, # 'search_parameters' will override 'homepage_parameters'
search_parameters = search_parameters,
cookies = cookies,
user_agent = "curl/3.30.1", # a random user agent will be set if none is provided
proxy = "socks5://127.0.0.1:9050", # supported URL schemes are 'http[s]', 'socks4[h]', and 'socks5[h]'
max_results = 200, # maximum unique URLs to return
min_sleep = 15, # minimum sleep between page requests
max_sleep = 30, # maximum sleep between page requests
consent_selector = "xpath=//img[@alt='Google']/../../following-sibling::div[2]/div/button[1]", # 'button[1]' rejects all, 'button[2]' accepts all
headless = False, # show the web browser
humanize = True, # enable human-like web browser interactions
debug = True # enable debug output
)
urls = asyncio.run(client.search())
if client.get_error() == nagooglesearch_playwright.Error.PLAYWRIGHT:
print("[ Playwright Exception ]")
# do something
elif client.get_error() == nagooglesearch_playwright.Error.REQUEST:
print("[ Request Exception ]")
# do something
elif client.get_error() == nagooglesearch_playwright.Error.RATE_LIMIT:
print("[ HTTP 429 Too Many Requests ]")
# do something
for url in urls:
print(url)
# do something
Check the list of user agents here. For more user agents, check scrapeops.io.
Shortest Possible
Example, shortest possible:
import nagooglesearch_playwright, asyncio
urls = asyncio.run(nagooglesearch_playwright.GoogleClient(search_parameters = {"q": "site:*.example.com intext:password"}).search())
# do something
Time Sensitive Search
Example, do not show results older than 6 months:
import nagooglesearch_playwright, dateutil.relativedelta as relativedelta
def get_tbs(months: int):
today = datetime.datetime.today()
return nagooglesearch_playwright.get_tbs(today, today - relativedelta.relativedelta(months = months))
search_parameters = {
"tbs": get_tbs(6)
}
# do something
User Agents
Example, get all user agents:
import nagooglesearch_playwright
user_agents = nagooglesearch_playwright.get_all_user_agents()
print(user_agents)
# do something
Example, get a random user agent:
import nagooglesearch_playwright
user_agent = nagooglesearch_playwright.get_random_user_agent()
print(user_agent)
# do something
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nagooglesearch_playwright-1.2.tar.gz.
File metadata
- Download URL: nagooglesearch_playwright-1.2.tar.gz
- Upload date:
- Size: 9.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dd9e35a713a61e8d28147cbcecdc58eef745c7d6e29b0d5e847f51d661636e63
|
|
| MD5 |
aec8352cb996aa962fe5c2d0bab6afba
|
|
| BLAKE2b-256 |
c069a3d4213daca96fa1696b20b6e6c5db651fa847bc7f81d95657037f623b7a
|
File details
Details for the file nagooglesearch_playwright-1.2-py3-none-any.whl.
File metadata
- Download URL: nagooglesearch_playwright-1.2-py3-none-any.whl
- Upload date:
- Size: 8.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
47de2a1148ec75aa0b370a9c95a89940127d2a48ba76cc9956b2d6e2d540cdaf
|
|
| MD5 |
4896ce43618ac436b25ec62b2ae9a4f9
|
|
| BLAKE2b-256 |
17fa5aeb805203e4070424545c9715c1e523867e60cb44ff375a9878908599a6
|