Skip to main content

Aysncio search engine scraping package

Project description

searchit

Searchit is a library for async scraping of search engines. The library supports multiple search engines (currently Google, Yandex, and Bing) with support for other search engines to come.

Install

pip install searchit

Can be installed using pip, by running the above command.

Using Searchit

import asyncio

from searchit import GoogleScraper, YandexScraper, BingScraper
from searchit import ScrapeRequest

request = ScrapeRequest("watch movies online", 30)
google = GoogleScraper(max_results_per_page=10) # max_results = Number of results per page
yandex = YandexScraper(max_results_per_page=10)

loop = asyncio.get_event_loop()

results = loop.run_until_complete(google.scrape(request))
results = loop.run_until_complete(yandex.scrape(request))

To use Searchit users first create a ScrapeRequest object, with term and number of results as required fields. This object can then be passed to multiple different search engines and scraped asynchronously.

Scrape Request - Object

term - Required str - the term to be searched for
count - Required int - the total number of results
domain - Optional[str] - the domain to search i.e. .com or .com
sleep - Optional[int] - time to wait betweeen paginating pages - important to prevent getting blocked
proxy - Optional[str] - proxy to be used to make request - default none
language - Optional[str] - language to conduct search in (only Google atm)
geo - Optional[str] - Geo location to conduct search from Yandex, and Qwant

Roadmap

  • Add additional search engines
  • Tests
  • Blocking non-async scrape method
  • Add support for page rendering (Selenium and Puppeteer)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

searchit-2019.12.30.2.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

searchit-2019.12.30.2-py3-none-any.whl (21.4 kB view details)

Uploaded Python 3

File details

Details for the file searchit-2019.12.30.2.tar.gz.

File metadata

  • Download URL: searchit-2019.12.30.2.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.6.5

File hashes

Hashes for searchit-2019.12.30.2.tar.gz
Algorithm Hash digest
SHA256 6f10b39fb73851ad7b9d2c73ea508d98e0cba0e30a84da1ae30852fc6616b0a6
MD5 e823312414638b329c3b491f0f5854c6
BLAKE2b-256 48c24df79da90f2e1dbb7d888e72e4ba0e653c02a9f305935091fd658d1ef379

See more details on using hashes here.

File details

Details for the file searchit-2019.12.30.2-py3-none-any.whl.

File metadata

  • Download URL: searchit-2019.12.30.2-py3-none-any.whl
  • Upload date:
  • Size: 21.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.6.5

File hashes

Hashes for searchit-2019.12.30.2-py3-none-any.whl
Algorithm Hash digest
SHA256 37d12a45783e417bdda27ec699c3dd33525b4887bd6b9bb8264fdc418102618e
MD5 2fd4df05d806781f7a07d9ff2f80de07
BLAKE2b-256 710c20ce035bfbf9a015cb9d2e25c394d479b0efa0e594930c742af1212db028

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page