Skip to main content

Aysncio search engine scraping package

Project description

searchit

Searchit is a library for async scraping of search engines. The library supports multiple search engines (currently Google, Yandex, and Bing) with support for other search engines to come.

Using Searchit

from searchit import GoogleScraper, YandexScraper, BingScraper
from searchit import ScrapeRequest

request = ScrapeRequest("watch movies online", 30)
google = GoogleScraper(max_results=10) # max_results = Number of results per page
yandex = YandexScraper(max_results=10)

results = await google.scrape(request)
results = await yandex.scrape(request)

To use Searchit users first create a ScrapeRequest object, with term and number of results as required fields. This object can then be passed to multiple different search engines and scraped asynchronously.

Scrape Request - Object

term - Required str - the term to be searched for
count - Required int - the total number of results
domain - Optional[str] - the domain to search i.e. .com or .com
sleep - Optional[int] - time to wait betweeen paginating pages - important to prevent getting blocked
proxy - Optional[str] - proxy to be used to make request - default none
language - Optional[str] - language to conduct search in (only Google atm)
yandex_geo - Optional[str] - Yandex location code to conduct search from - default code for London

Roadmap

  • Resolve issues with Yandex
  • Add additional search engines
  • Tests
  • Blocking non-async scrape method
  • Add support for page rendering (Selenium and Puppeteer)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

searchit-2019.12.29.1.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

searchit-2019.12.29.1-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file searchit-2019.12.29.1.tar.gz.

File metadata

  • Download URL: searchit-2019.12.29.1.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.6.5

File hashes

Hashes for searchit-2019.12.29.1.tar.gz
Algorithm Hash digest
SHA256 754d72a7c74aefde2a39f24454141d11069a81cf67241af9661eac0e1030604f
MD5 c2b2ec980fc846a8f7dae7276bdf8a20
BLAKE2b-256 20f2fdf443f34618a4a6f4752e67e3030098cf80bc85666e6b27f663c586eb5e

See more details on using hashes here.

File details

Details for the file searchit-2019.12.29.1-py3-none-any.whl.

File metadata

  • Download URL: searchit-2019.12.29.1-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.6.5

File hashes

Hashes for searchit-2019.12.29.1-py3-none-any.whl
Algorithm Hash digest
SHA256 47115c413a12221de08ed542ac1bb24d0855676388b0330d699f080fc57c8afe
MD5 fa19f053e21b8956bde113d5043c9706
BLAKE2b-256 02fd51cbdb75ecd0a6a0f8c81867a6c32b6669f6528fdf8520166597f71b21b7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page