Skip to main content

Not another Google searching tool.

Project description

Not Another Google Search - Playwright

Not another Google searching library. Just kidding - it is.

Made for educational purposes. I hope it will help!

Table of Contents

How to Install

Install Playwright and Chromium

pip3 install --upgrade playwright

playwright install chromium

Make sure each time you upgrade your Playwright dependency to re-install Chromium; otherwise, you might get an error using the headless browser.

Standard Install

pip3 install nagooglesearch-playwright

pip3 install --upgrade nagooglesearch-playwright

Build and Install From the Source

git clone https://github.com/ivan-sincek/nagooglesearch-playwright && cd nagooglesearch-playwright

python3 -m pip install --upgrade build

python3 -m build

python3 -m pip install dist/nagooglesearch_playwright-1.2-py3-none-any.whl

Usage

Standard

Default values:

nagooglesearch_playwright.GoogleClient(
	tld = "com",
	homepage_parameters = {
		"btnK": "Google+Search",
		"source": "hp"
	},
	search_parameters = {
	},
	cookies = {
	},
	user_agent = "",
	proxy = "",
	max_results = 100,
	min_sleep = 8,
	max_sleep = 18,
	consent_selector = "xpath=//img[@alt='Google']/../../following-sibling::div[2]/div/button[1]",
	headless = True,
	humanize = False,
	debug = False
)

Only domains without they keyword google and not ending with the keyword goo.gl are accepted as valid results. The final output is a unique and sorted list of URLs.

Example, standard:

import nagooglesearch_playwright, asyncio

# the following query string parameters are set only if 'start' query string parameter is not set or is equal to zero
# simulate a homepage search
homepage_parameters = {
	"btnK": "Google+Search",
	"source": "hp"
}

# search the internet for additional query string parameters
# https://brightdata.com/blog/web-data/google-search-url-parameters
search_parameters = {
	"q": "site:*.example.com intext:password", # search query
	"tbs": "li:1", # specify 'li:1' for verbatim search (no alternate spellings, etc.)
	"hl": "en",
	"lr": "lang_en",
	"cr": "countryUS",
	"udm": "14", # only web results
	"filter": "0", # specify '0' to display hidden results
	"safe": "images" # specify 'images' to turn off safe search, or specify 'active' to turn on safe search
}

# specify custom cookies here
cookies = {
}

client = nagooglesearch_playwright.GoogleClient(
	tld = "com", # top level domain, e.g., www.google.com or www.google.hr
	homepage_parameters = homepage_parameters, # 'search_parameters' will override 'homepage_parameters'
	search_parameters = search_parameters,
	cookies = cookies,
	user_agent = "curl/3.30.1", # a random user agent will be set if none is provided
	proxy = "socks5://127.0.0.1:9050", # supported URL schemes are 'http[s]', 'socks4[h]', and 'socks5[h]'
	max_results = 200, # maximum unique URLs to return
	min_sleep = 15, # minimum sleep between page requests
	max_sleep = 30, # maximum sleep between page requests
	consent_selector = "xpath=//img[@alt='Google']/../../following-sibling::div[2]/div/button[1]", # 'button[1]' rejects all, 'button[2]' accepts all
	headless = False, # show the web browser
	humanize = True, # enable human-like web browser interactions
	debug = True # enable debug output
)

urls = asyncio.run(client.search())

if client.get_error() == nagooglesearch_playwright.Error.PLAYWRIGHT:
	print("[ Playwright Exception ]")
	# do something
elif client.get_error() == nagooglesearch_playwright.Error.REQUEST:
	print("[ Request Exception ]")
	# do something
elif client.get_error() == nagooglesearch_playwright.Error.RATE_LIMIT:
	print("[ HTTP 429 Too Many Requests ]")
	# do something

for url in urls:
	print(url)
	# do something

Check the list of user agents here. For more user agents, check scrapeops.io.

Shortest Possible

Example, shortest possible:

import nagooglesearch_playwright, asyncio

urls = asyncio.run(nagooglesearch_playwright.GoogleClient(search_parameters = {"q": "site:*.example.com intext:password"}).search())

# do something

Time Sensitive Search

Example, do not show results older than 6 months:

import nagooglesearch_playwright, dateutil.relativedelta as relativedelta

def get_tbs(months: int):
	today = datetime.datetime.today()
	return nagooglesearch_playwright.get_tbs(today, today - relativedelta.relativedelta(months = months))

search_parameters = {
	"tbs": get_tbs(6)
}

# do something

User Agents

Example, get all user agents:

import nagooglesearch_playwright

user_agents = nagooglesearch_playwright.get_all_user_agents()
print(user_agents)

# do something

Example, get a random user agent:

import nagooglesearch_playwright

user_agent = nagooglesearch_playwright.get_random_user_agent()
print(user_agent)

# do something

Project details


Release history Release notifications | RSS feed

This version

1.2

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nagooglesearch_playwright-1.2.tar.gz (9.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nagooglesearch_playwright-1.2-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file nagooglesearch_playwright-1.2.tar.gz.

File metadata

  • Download URL: nagooglesearch_playwright-1.2.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for nagooglesearch_playwright-1.2.tar.gz
Algorithm Hash digest
SHA256 dd9e35a713a61e8d28147cbcecdc58eef745c7d6e29b0d5e847f51d661636e63
MD5 aec8352cb996aa962fe5c2d0bab6afba
BLAKE2b-256 c069a3d4213daca96fa1696b20b6e6c5db651fa847bc7f81d95657037f623b7a

See more details on using hashes here.

File details

Details for the file nagooglesearch_playwright-1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for nagooglesearch_playwright-1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 47de2a1148ec75aa0b370a9c95a89940127d2a48ba76cc9956b2d6e2d540cdaf
MD5 4896ce43618ac436b25ec62b2ae9a4f9
BLAKE2b-256 17fa5aeb805203e4070424545c9715c1e523867e60cb44ff375a9878908599a6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page