Skip to main content

Query and scrape search engines.

Project description

Query and scrape search engines (Google, Google News, Yahoo, Yahoo News, Bing, Bing News, Ask, Dogpile, Dogpile News)


Installation

pip install search_engines

Overview

Each search engine has a module {engine_name}.py which two functions:

extract_search_results(html: str, page_url: str) -> Tuple[List[Dict[str, str]], str]

and

get_search_url(query: str, latest: bool = True, country: str = 'us') -> str

Usage Example

Construct a URL for the first results page of searching "Tesla TSLA" in Bing Search.

from search_engines import bing_search

url = bing_search.get_search_url('Tesla TSLA')

Load the URL using a simple HTTP client or web browser and extract the page HTML. This package does not make any restrictions on clients can be used. We'll use the requests library for this example.

import requests

resp = requests.get(url)
html = resp.text

We can now extract search results from the HTML. The returned results list will be a list of dictionaries with keys url, title, preview_text, page_number. If we want to scrape multiple pages, we can load the next page using the returned next_page_url, and again extracting the results using extract_search_results.

results, next_page_url = bing_search.extract_search_results(html, url)

Contributions

Add new search engines! =)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

search_engines-1.0.7.tar.gz (5.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page