Skip to main content

Query and scrape search engines.

Project description

Query and scrape search engines (Google, Google News, Yahoo, Yahoo News, Bing, Bing News, Ask, Dogpile, Dogpile News)


Installation

pip install search_engines

Overview

Each search engine has a module {engine_name}.py which two functions:

extract_search_results(html: str, page_url: str) -> Tuple[List[Dict[str, str]], str]

and

get_search_url(query: str, latest: bool = True, country: str = 'us') -> str

Usage Example

Construct a URL for the first results page of searching "Tesla TSLA" in Bing Search.

from search_engines import bing_search

url = bing_search.get_search_url('Tesla TSLA')

Load the URL using a simple HTTP client or web browser and extract the page HTML. This package does not make any restrictions on clients can be used. We'll use the requests library for this example.

import requests

resp = requests.get(url)
html = resp.text

We can now extract search results from the HTML. The returned results list will be a list of dictionaries with keys url, title, preview_text, page_number. If we want to scrape multiple pages, we can load the next page using the returned next_page_url, and again extracting the results using extract_search_results.

results, next_page_url = bing_search.extract_search_results(html, url)

Contributions

Add new search engines! =)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

search_engines-1.0.7.tar.gz (5.8 kB view details)

Uploaded Source

File details

Details for the file search_engines-1.0.7.tar.gz.

File metadata

  • Download URL: search_engines-1.0.7.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.8.5

File hashes

Hashes for search_engines-1.0.7.tar.gz
Algorithm Hash digest
SHA256 2ec8181daca92a085cccdb46ef28bfb59662f7eff3fbef206e50a32171b384fb
MD5 8d47fc3594d42bd78a392988fffd46da
BLAKE2b-256 ba326c85afb363fdd1bcc045371938449c4de430d1fa7e6d90087be164c35df6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page