Skip to main content

Python API to DATASHAKE reviews

Project description

datashakereviewsapi: python API-wrapper for DATASHAKE reviews

Python API-wrapper for DATASHAKE reviews API (https://www.datashake.com/review-scraper-api) This module makes it easier to schedule jobs and fetch the results Official web API documentation: https://api.datashake.com/#reviews You need to have datashake API key to use this module

Installation

pip install datashakereviewsapi

Usage examples

Initiate API instance

from datashakereviewsapi import DatashakeReviewAPI

# Initiate API instance with your API key from DATASHAKE
api = DatashakeReviewAPI('your_datashake_reviews_scraper_api_key')

Schedule a single job with a URL to review page. DATASHALE API takes several hours to crawl the page and collect the results.

response = api.schedule_job('https://uk.trustpilot.com/review/store.playstation.com')
# save job_id for querying the results later
first_job_id = response['job_id']

Get the job results - reviews

reviews = api.get_job_reviews(first_job_id)

Schedule another job with a reference to the first one - get delta (new reviews) only

response2 = api.schedule_job('https://uk.trustpilot.com/review/store.playstation.com',
                              previous_job_id=first_job_id)

Create a job list (one row in the example) and schedule jobs for all the urls from the list

jobs_list = pd.DataFrame(columns=['Website', 'url', 'latest_job_id', 'status', 'last_crawl',
       'latest_schedule_message'])
jobs_list['url'] = ['https://uk.trustpilot.com/review/store.playstation.com']
updated_job_list = api.schedule_job_list(jobs_list)

And ultimately - fetch the reviews and save them to a csv file, reschedule all jobs in the jobs list

# Plug-n-Play block to schedule/update jobs and get/save results
# The prerequisite for running the snippet is existence of two CSV files with the following structure:
# jobs_list.csv columns: ['Website', 'url', 'latest_job_id', 'status', 'last_crawl', 'latest_schedule_message']
# reviews_list.csv columns: ['job_id', 'source_name', 'id', 'name', 'date', 'rating_value',
#                           'review_text', 'url', 'profile_picture', 'location', 'review_title',
#                           'verified_order', 'reviewer_title', 'language_code', 'meta_data']


# Code block refresh review jobs and review results
jobs_list_filepath = 'job_list.csv'
reviews_list_filepath = 'reviews_list.csv'

df_jobs = pd.read_csv(jobs_list_filepath, index_col='id')
df_reviews = pd.read_csv(reviews_list_filepath, index_col='unique_id')

df_jobs_new, df_reviews_new = api.get_job_list_reviews(df_jobs, df_reviews)

df_jobs_new.to_csv(jobs_list_filepath, encoding='utf-8-sig')
df_reviews_new.to_csv(reviews_list_filepath, encoding='utf-8-sig')


# Codes block to reschedule review jobs
df_jobs = pd.read_csv(jobs_list_filepath, index_col='id')
df_jobs_new = api.schedule_job_list(df_jobs)
df_jobs_new.to_csv(jobs_list_filepath, encoding='utf-8-sig')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datashakereviewsapi-1.3.tar.gz (6.9 kB view details)

Uploaded Source

File details

Details for the file datashakereviewsapi-1.3.tar.gz.

File metadata

  • Download URL: datashakereviewsapi-1.3.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.4

File hashes

Hashes for datashakereviewsapi-1.3.tar.gz
Algorithm Hash digest
SHA256 0501df750f3f017601e58dd15cd5ca55dabf536942d9d8e4aa914297475b0bb7
MD5 d9042532737c847c0c95407a04f8c65f
BLAKE2b-256 3f2c77df5a8ec60168e330ab084192b0a1a1184842bb12356498261c3f1a71a4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page