Search Engines Scraper
Project description
\n# search_engines
A Python library that queries Google, Bing, Yahoo and other search engines and collects the results from multiple search engine results pages.
Please note that web-scraping may be against the TOS of some search engines, and may result in a temporary ban.
Supported search engines
Google
Bing
Yahoo
Duckduckgo
Startpage
Aol
Dogpile
Ask
Mojeek
Brave
Torch
Features
- Creates output files (html, csv, json).
- Supports search filters (url, title, text).
- HTTP and SOCKS proxy support.
- Collects dark web links with Torch.
- Easy to add new search engines. You can add a new engine by creating a new class in
search_engines/engines/
and add it to thesearch_engines_dict
dictionary insearch_engines/engines/__init__.py
. The new class should subclassSearchEngine
, and override the following methods:_selectors
,_first_page
,_next_page
. - Python2 - Python3 compatible.
Requirements
Python 2.7 - 3.x with
Requests and
BeautifulSoup
Installation
Run the setup file: $ python setup.py install
.
Done!
Usage
As a library:
from search_engines import Google
engine = Google()
results = engine.search("my query")
links = results.links()
print(links)
As a CLI script:
$ python search_engines_cli.py -e google,bing -q "my query" -o json,print
Other versions
- async-search-scraper A really cool asynchronous implementation, written by @soxoj
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
search_engines_kit-0.0.19.tar.gz
(18.4 kB
view hashes)
Built Distribution
Close
Hashes for search_engines_kit-0.0.19.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | e3788a053f9bfb77eb1eb80f59e4f455b6db163f0ff051f759fbddb4e5ea58c7 |
|
MD5 | 9ab7b0d09bf630a45753e4fa9724a4d6 |
|
BLAKE2b-256 | f20ce449fbd45edeefb4398d8858acd82cf8d1daaac64a59a9669856ed97f8f1 |
Close
Hashes for search_engines_kit-0.0.19-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d138829ac33ceb6f6ead1d41517b5397109c01c832ad4a7db2015b4ac2bf6140 |
|
MD5 | 7941097c9072d7a658813321a4bc3c98 |
|
BLAKE2b-256 | 8e0bbf5a91cedcbf36bc2e3d3a4f5f46809d6d1ae7e42679369df27dd64d6b57 |