Skip to main content

Aysncio search engine scraping package

Project description

searchit

Searchit is a library for async scraping of search engines. The library supports multiple search engines (currently Google, Yandex, and Bing) with support for other search engines to come.

Install

pip install searchit

Can be installed using pip, by running the above command.

Using Searchit

import asyncio

from searchit import GoogleScraper, YandexScraper, BingScraper
from searchit import ScrapeRequest

request = ScrapeRequest("watch movies online", 30)
google = GoogleScraper(max_results_per_page=10) # max_results = Number of results per page
yandex = YandexScraper(max_results_per_page=10)

loop = asyncio.get_event_loop()

results = loop.run_until_complete(google.scrape(request))
results = loop.run_until_complete(yandex.scrape(request))

To use Searchit users first create a ScrapeRequest object, with term and number of results as required fields. This object can then be passed to multiple different search engines and scraped asynchronously.

Scrape Request - Object

term - Required str - the term to be searched for
count - Required int - the total number of results
domain - Optional[str] - the domain to search i.e. .com or .com
sleep - Optional[int] - time to wait betweeen paginating pages - important to prevent getting blocked
proxy - Optional[str] - proxy to be used to make request - default none
language - Optional[str] - language to conduct search in (only Google atm)
yandex_geo - Optional[str] - Yandex location code to conduct search from - default code for London

Roadmap

  • Add additional search engines
  • Tests
  • Blocking non-async scrape method
  • Add support for page rendering (Selenium and Puppeteer)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for searchit, version 2019.12.30.1
Filename, size File type Python version Upload date Hashes
Filename, size searchit-2019.12.30.1-py3-none-any.whl (20.2 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size searchit-2019.12.30.1.tar.gz (5.0 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page