Aysncio search engine scraping package
Project description
searchit
Searchit is a library for async scraping of search engines. The library supports multiple search engines (currently Google, Yandex, and Bing) with support for other search engines to come.
Using Searchit
from searchit import GoogleScraper, YandexScraper, BingScraper
from searchit import ScrapeRequest
request = ScrapeRequest("watch movies online", 30)
google = GoogleScraper(max_results=10) # max_results = Number of results per page
yandex = YandexScraper(max_results=10)
results = await google.scrape(request)
results = await yandex.scrape(request)
To use Searchit users first create a ScrapeRequest object, with term and number of results as required fields. This object can then be passed to multiple different search engines and scraped asynchronously.
Scrape Request - Object
term - Required str - the term to be searched for
count - Required int - the total number of results
domain - Optional[str] - the domain to search i.e. .com or .com
sleep - Optional[int] - time to wait betweeen paginating pages - important to prevent getting blocked
proxy - Optional[str] - proxy to be used to make request - default none
language - Optional[str] - language to conduct search in (only Google atm)
yandex_geo - Optional[str] - Yandex location code to conduct search from - default code for London
Roadmap
- Resolve issues with Yandex
- Add additional search engines
- Tests
- Blocking non-async scrape method
- Add support for page rendering (Selenium and Puppeteer)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
searchit-2019.12.29.1.tar.gz
(4.9 kB
view details)
Built Distribution
File details
Details for the file searchit-2019.12.29.1.tar.gz
.
File metadata
- Download URL: searchit-2019.12.29.1.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 754d72a7c74aefde2a39f24454141d11069a81cf67241af9661eac0e1030604f |
|
MD5 | c2b2ec980fc846a8f7dae7276bdf8a20 |
|
BLAKE2b-256 | 20f2fdf443f34618a4a6f4752e67e3030098cf80bc85666e6b27f663c586eb5e |
File details
Details for the file searchit-2019.12.29.1-py3-none-any.whl
.
File metadata
- Download URL: searchit-2019.12.29.1-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 47115c413a12221de08ed542ac1bb24d0855676388b0330d699f080fc57c8afe |
|
MD5 | fa19f053e21b8956bde113d5043c9706 |
|
BLAKE2b-256 | 02fd51cbdb75ecd0a6a0f8c81867a6c32b6669f6528fdf8520166597f71b21b7 |