scrapes search engine pages for query titles, descriptions and links
Project description
Search Engine Parser
"If it is a search engine, then it can be parsed" - Some random guy
Package to query popular search engines and scrape for result titles, links and descriptions. Aims to scrape the widest range of search engines. View all supported engines
Popular Supported Engines
Some of the popular search engines include:
- DuckDuckGo
- GitHub
- StackOverflow
- Baidu
- YouTube
View all supported engines
Installation
# install only package dependencies
pip install search-engine-parser
# Installs `pysearch` cli tool
pip install "search-engine-parser[cli]"
Development
Clone the repository
git clone git@github.com:bisoncorps/search-engine-parser.git
Create virtual environment and install requirements
mkvirtualenv search_engine_parser
pip install -r requirements/dev.txt
Code Documentation
Found on Read the Docs
Running the tests
pytest
Usage
Code
Query Results can be scraped from popular search engines as shown in the example snippet below
from search_engine_parser.engines.yahoo import Search as YahooSearch
from search_engine_parser.engines.google import Search as GoogleSearch
from search_engine_parser.engines.bing import Search as BingSearch
import pprint
search_args = ('preaching to the choir', 1)
gsearch = GoogleSearch()
ysearch = YahooSearch()
bsearch = BingSearch()
gresults = gsearch.search(*search_args)
yresults = ysearch.search(*search_args)
bresults = bsearch.search(*search_args)
a = {
"Google": gresults,
"Yahoo": yresults,
"Bing": bresults}
# pretty print the result from each engine
for k, v in a.items():
print(f"-------------{k}------------")
pprint.pprint(v)
# print first title from google search
print(gresults["titles"][0])
# print 10th link from yahoo search
print(yresults["links"][9])
# print 6th description from bing search
print(bresults["descriptions"][5])
For localization, you can pass the url
keyword and a localized url. This would use the url to query and parse using the same engine's parser
# Use google.de instead of google.com
results = gsearch.search(*search_args, url="google.de")
Command line
Search engine parser comes with a CLI tool known as pysearch
e.g
pysearch --engine bing search --query "Preaching to the choir" --type descriptions
Result
'Preaching to the choir' originated in the USA in the 1970s. It is a variant of the earlier 'preaching to the converted', which dates from England in the late 1800s and has the same meaning. Origin - the full story 'Preaching to the choir' (also sometimes spelled quire) is of US origin.
There is a needed argument for the CLI i.e -e Engine
followed by either of two subcommands in the CLI i.e search
and summary
usage: pysearch [-h] [-u URL] [-e ENGINE] {search,summary} ...
SearchEngineParser
positional arguments:
{search,summary} help for subcommands
search search help
summary summary help
optional arguments:
-h, --help show this help message and exit
-u URL, --url URL A custom link to use as base url for search e.g
google.de
-e ENGINE, --engine ENGINE
Engine to use for parsing the query e.g google, yahoo,
bing,duckduckgo (default: google)
summary
just shows the summary of each search engine added with descriptions on the return
pysearch --engine google summary
Full arguments for the search
subcommand shown below
usage: pysearch search [-h] -q QUERY [-p PAGE] [-t TYPE] [-r RANK]
optional arguments:
-h, --help show this help message and exit
-q QUERY, --query QUERY
Query string to search engine for
-p PAGE, --page PAGE Page of the result to return details for (default: 1)
-t TYPE, --type TYPE Type of detail to return i.e full, links, desciptions
or titles (default: full)
-r RANK, --rank RANK ID of Detail to return e.g 5 (default: 0)
Code of Conduct
All actions performed should adhere to the code of conduct
Contribution
Before making any contribution, please follow the contribution guide
License (MIT)
This project is opened under the MIT 2.0 License which allows very broad use for both academic and commercial purposes.
Contributors ✨
Thanks goes to these wonderful people (emoji key):
Ed Luff 💻 |
Diretnan Domnan 🚇 ⚠️ 🔧 💻 |
MeNsaaH 🚇 ⚠️ 🔧 💻 |
Aditya Pal ⚠️ 💻 📖 |
Avinash Reddy 🐛 |
David Onuh 💻 ⚠️ |
Panagiotis Simakis 💻 ⚠️ |
reiarthur 💻 |
Ashokkumar TA 💻 |
Andreas Teuber 💻 |
mi096684 🐛 |
devajithvs 💻 |
This project follows the all-contributors specification. Contributions of any kind welcome!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.