Python API to DATASHAKE reviews
Project description
datashakereviewsapi: python API to DATASHAKE reviews
Python API to DATASHAKE reviews (https://www.datashake.com/review-scraper-api) This module makes it easier to schedule jobs and fetch the results Official web API documentation: https://api.datashake.com/#reviews You need to have datashake API key to use this module
Installation
Through cloning this repositary only. [at the moment]
Usage examples
Initiate API instance
from datashakereviewsapi.datashakereviewsapi import DatashakeReviewAPI
# Initiate API instance with your API key from DATASHAKE
api = DatashakeReviewAPI('your_datashake_reviews_scraper_api_key')
Schedule a single job with a URL to review page. DATASHALE API takes several hours to crawl the page and collect the results.
response = api.schedule_job('https://uk.trustpilot.com/review/store.playstation.com')
# save job_id for querying the results later
first_job_id = response['job_id']
Get the job results - reviews
reviews = api.get_job_reviews(first_job_id)
Schedule another job with a reference to the first one - get delta (new reviews) only
response2 = api.schedule_job('https://uk.trustpilot.com/review/store.playstation.com',
previous_job_id=first_job_id)
Create a job list (one row in the example) and schedule jobs for all the urls from the list
jobs_list = pd.DataFrame(columns=['Website', 'url', 'latest_job_id', 'status', 'last_crawl',
'latest_schedule_message'])
jobs_list['url'] = ['https://uk.trustpilot.com/review/store.playstation.com']
updated_job_list = api.schedule_job_list(jobs_list)
And ultimately - fetch the reviews and save them to a csv file, reschedule all jobs in the jobs list
# Plug-n-Play block to schedule/update jobs and get/save results
# The prerequisite for running the snippet is existence of two CSV files with the following structure:
# jobs_list.csv columns: ['Website', 'url', 'latest_job_id', 'status', 'last_crawl', 'latest_schedule_message']
# reviews_list.csv columns: ['job_id', 'source_name', 'id', 'name', 'date', 'rating_value',
# 'review_text', 'url', 'profile_picture', 'location', 'review_title',
# 'verified_order', 'reviewer_title', 'language_code', 'meta_data']
# Code block refresh review jobs and review results
jobs_list_filepath = 'job_list.csv'
reviews_list_filepath = 'reviews_list.csv'
df_jobs = pd.read_csv(jobs_list_filepath, index_col='id')
df_reviews = pd.read_csv(reviews_list_filepath, index_col='unique_id')
df_jobs_new, df_reviews_new = api.get_job_list_reviews(df_jobs, df_reviews)
df_jobs_new.to_csv(jobs_list_filepath, encoding='utf-8-sig')
df_reviews_new.to_csv(reviews_list_filepath, encoding='utf-8-sig')
# Codes block to reschedule review jobs
df_jobs = pd.read_csv(jobs_list_filepath, index_col='id')
df_jobs_new = api.schedule_job_list(df_jobs)
df_jobs_new.to_csv(jobs_list_filepath, encoding='utf-8-sig')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file datashakereviewsapi-1.1.tar.gz.
File metadata
- Download URL: datashakereviewsapi-1.1.tar.gz
- Upload date:
- Size: 6.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
efec620c9663c70ef1ac0239f834549df7bb6d6ce72dad9593ff0e65c0e6ceca
|
|
| MD5 |
1fc8946621c41dd870d4a89ff11d69dc
|
|
| BLAKE2b-256 |
a2edf1d15f2cca6d3b703a186fe75a1c9ca3c63b8a74e8f4da7de8a1dc43e8d3
|