search-engines

Query and scrape search engines.

These details have not been verified by PyPI

Project links

Homepage

Project description

Query and scrape search engines (Google, Google News, Yahoo, Yahoo News, Bing, Bing News, Ask, Dogpile, Dogpile News)

Installation

pip install search_engines

Overview

Each search engine has a module {engine_name}.py which two functions:

extract_search_results(html: str, page_url: str) -> Tuple[List[Dict[str, str]], str]

and

get_search_url(query: str, latest: bool = True, country: str = 'us') -> str

Usage Example

Construct a URL for the first results page of searching "Tesla TSLA" in Bing Search.

from search_engines import bing_search

url = bing_search.get_search_url('Tesla TSLA')

Load the URL using a simple HTTP client or web browser and extract the page HTML. This package does not make any restrictions on clients can be used. We'll use the requests library for this example.

import requests

resp = requests.get(url)
html = resp.text

We can now extract search results from the HTML. The returned results list will be a list of dictionaries with keys url, title, preview_text, page_number. If we want to scrape multiple pages, we can load the next page using the returned next_page_url, and again extracting the results using extract_search_results.

results, next_page_url = bing_search.extract_search_results(html, url)

Contributions

Add new search engines! =)

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.7

May 19, 2021

1.0.6

May 19, 2021

1.0.5

May 19, 2021

1.0.1

May 8, 2020

1.0.0

Apr 9, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

search_engines-1.0.7.tar.gz (5.8 kB view hashes)

Uploaded May 19, 2021 Source

Hashes for search_engines-1.0.7.tar.gz

Hashes for search_engines-1.0.7.tar.gz
Algorithm	Hash digest
SHA256	`2ec8181daca92a085cccdb46ef28bfb59662f7eff3fbef206e50a32171b384fb`
MD5	`8d47fc3594d42bd78a392988fffd46da`
BLAKE2b-256	`ba326c85afb363fdd1bcc045371938449c4de430d1fa7e6d90087be164c35df6`