scrapes search engine pages for query titles, descriptions and links
Project description
Search Engine Parser
"If it is a search engine, then it can be parsed" - Some random guy
Package to query popular search engines and scrape for result titles, links and descriptions. Aims to scrape the widest range of search engines. View all supported engines
Popular Supported Engines
Some of the popular search engines include:
- DuckDuckGo
- GitHub
- StackOverflow
View all supported engines
Installation
pip install search-engine-parser
Development
Clone the repository
git clone git@github.com:bisoncorps/search-engine-parser.git
Create virtual environment and install requirements
mkvirtualenv search_engine_parser
pip install -r requirements-dev.txt
Code Documentation
Found on Read the Docs
Running the tests
pytest
Usage
Code
Query Results can be scraped from popular search engines as shown in the example snippet below
from search_engine_parser import YahooSearch, GoogleSearch, BingSearch
import pprint
search_args = ('preaching to the choir', 1)
gsearch = GoogleSearch()
ysearch = YahooSearch()
bsearch = BingSearch()
gresults = gsearch.search(*search_args)
yresults = ysearch.search(*search_args)
bresults = bsearch.search(*search_args)
a = {
"Google": gresults,
"Yahoo": yresults,
"Bing": bresults}
# pretty print the result from each engine
for k, v in a.items():
print(f"-------------{k}------------")
pprint.pprint(v)
# print first title from google search
print(gresults["titles"][0])
# print 10th link from yahoo search
print(yresults["links"][9])
# print 6th description from bing search
print(bresults["descriptions"][5])
Command line
Use python module runner to run the parser on the command line e.g
python -m search_engine_parser.core.cli --engine bing search --query "Preaching to the choir" --type descriptions
Result
'Preaching to the choir' originated in the USA in the 1970s. It is a variant of the earlier 'preaching to the converted', which dates from England in the late 1800s and has the same meaning. Origin - the full story 'Preaching to the choir' (also sometimes spelled quire) is of US origin.
There is a needed argument for the CLI i.e -e Engine
and two subcommands in the CLI i.e search
and summary
SearchEngineParser
positional arguments:
{search,summary} help for subcommands
search search help
summary summary help
optional arguments:
-h, --help show this help message and exit
-e ENGINE, --engine ENGINE
Engine to use for parsing the query e.g google, yahoo,
bing, duckduckgo (default: google)
summary
just shows the summary of each search engine added with descriptions on the return
python -m search_engine_parser.core.cli --engine google summary
Full arguments for the search
subcommand shown below
usage: cli.py search [-h] -q QUERY [-p PAGE] [-t TYPE] [-r RANK]
optional arguments:
-h, --help show this help message and exit
-q QUERY, --query QUERY
Query string to search engine for
-p PAGE, --page PAGE Page of the result to return details for (default: 1)
-t TYPE, --type TYPE Type of detail to return i.e full, links, desciptions
or titles (default: full)
-r RANK, --rank RANK ID of Detail to return e.g 5 (default: 0)
Code of Conduct
All actions performed should adhere to the code of conduct
Contribution
Before making any contribution, please follow the contribution guide
License (MIT)
This project is opened under the MIT 2.0 License which allows very broad use for both academic and commercial purposes.
Contributors ✨
Thanks goes to these wonderful people (emoji key):
Ed Luff 💻 |
Diretnan Domnan 🚇 ⚠️ 🔧 💻 |
MeNsaaH 🚇 ⚠️ 🔧 💻 |
Aditya Pal ⚠️ 💻 |
This project follows the all-contributors specification. Contributions of any kind welcome!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for search-engine-parser-0.5.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | b81fc36d74da55280df7ee62dcd0c294a1f3c17ab074993eef00f30c4ac919b4 |
|
MD5 | b624f748596deb3af2249ea32f364fb3 |
|
BLAKE2b-256 | ec5629a3e6e098674ffef9a12c360275e90146a96bde626f277d329248aafe76 |