Query and scrape search engines.
Project description
Query and scrape search engines (Google, Google News, Yahoo, Yahoo News, Bing, Bing News, Ask, Dogpile, Dogpile News)
Installation
pip install search_engines
Overview
Each search engine has a module {engine_name}.py which two functions:
extract_search_results(html: str, page_url: str) -> Tuple[List[Dict[str, str]], str]
and
get_search_url(query: str, latest: bool = True, country: str = 'us') -> str
Usage Example
Construct a URL for the first results page of searching "Tesla TSLA" in Bing Search.
from search_engines import bing_search
url = bing_search.get_search_url('Tesla TSLA')
Load the URL using a simple HTTP client or web browser and extract the page HTML.
This package does not make any restrictions on clients can be used. We'll use the requests
library for this example.
import requests
resp = requests.get(url)
html = resp.text
We can now extract search results from the HTML.
The returned results
list will be a list of dictionaries with keys url
, title
, preview_text
, page_number
.
If we want to scrape multiple pages, we can load the next page using the returned next_page_url
, and again extracting the results using extract_search_results
.
results, next_page_url = bing_search.extract_search_results(html, url)
Contributions
Add new search engines! =)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file search_engines-1.0.7.tar.gz
.
File metadata
- Download URL: search_engines-1.0.7.tar.gz
- Upload date:
- Size: 5.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2ec8181daca92a085cccdb46ef28bfb59662f7eff3fbef206e50a32171b384fb |
|
MD5 | 8d47fc3594d42bd78a392988fffd46da |
|
BLAKE2b-256 | ba326c85afb363fdd1bcc045371938449c4de430d1fa7e6d90087be164c35df6 |