Skip to main content

Unoffice TrustPilot API to download reviews scores and contents

Project description

TrustPilotReader

Unofficial TrustPilot reviews collector. For Academic Use Only. READ: TrustPilot Terms of Use

Disclamer:

You, and you alone, are responsible for following TrustPilot terms and using this tool to gather their data. Respect their servers and be thoughtful when gathering large amount of data.

Unmatured Documetation :)

This code implements basic data scraping of TrustPilot [default:Danish] Reviews .

It is a prototype to be used for academic reasons only. TrustPilot offers APIs to gather their data

Get it from PyPI

pip install trustpilotreviews

How to use it:

Import package

from trustpilotreviews import GetReviews

1. Initiat Class

Initiate the class with either (a) passing a dictionary of companies as keys and companies TrustPilot id as items or (b) adding them with dictionary syntax.

e.g.

# way a: Using dictionary with business ids
id_dict = {'Skat':'470bce96000064000501e32d','DR':'4690598c00006400050003ee'}
d = GetReviews(id_dict)

# ids dictionary can be loaded from text files e.g.
lines = np.genfromtxt('data/business_ids.csv', delimiter=',',
                            dtype=str,skip_header=1) #skipped header
csv_dict = {key:item for key, item in lines}
d = GetReviews(csv_dict)

# way b: Using dictionary assignment 
d = GetReviews()
d['Skat'] = '470bce96000064000501e32d'

No business ids, no problem:

from trustpilotreviews import GetReviews

# Initiate it. Language will be required
t = GetReviews()

# Pass in web-page address as it appears in trustpilot.com
mate_id = t.get_id('www.mate.bike')

# Check if everything is ok
if mate_id.ok:
    print(mate.business_id)

# Gather data from that id
data = t.get_reviews() 
    

Having multiple websites, well, no problem:

from trustpilotreviews import GetReviews

t = GetReviews()

# pass multiple web-pages as a list
ids = t.get_ids(['www.ford.dk','www.mate.bike'])

print(ids) # same as print(t) as ids are added to que

# gather data for those ids  
data = t.get_reviews()   

Want to save it on a database instead of Pandas, done:

from trustpilotreviews import GetReviews

t = GetReviews()
ids = t.get_ids(['www.ford.dk','www.mate.bike'])

# mine data for those ids 
t.get_reviews()

# send them to in memory database
t.send_db('../data/','reviews')   

2. Reading Data

df = pd.DataFrame(t.dictData)

or from stored source

df = pd.read_pickle('TrustPilotData.pkl')

A full example:

import numpy as np
from trustpilotreviews import GetReviews


# Dictionary from Data 
lines = np.genfromtxt('companies_ids.csv', delimiter=',',
                      dtype=str, skip_header=1)
csv_dict = {key: item for key, item in lines}

d = GetReviews(csv_dict) # Select no for Norwegian Reviews
d.gather_data()

# Saves as pandas dataframe pickle
d.save_data(file_name='NoTrustPilotData')

TODOs:

  • Allow different saving formats e.g. df.to_XXX
  • Split page_review funciton into connection and data parsing (better way to handler bad requests)
  • Add more features
  • Write a better documetation

TrustPilot Terms of Use

c091684c-879c-4d6e-90c6-92fbc53cb676

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trustpilotreviews-0.1.2.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

trustpilotreviews-0.1.2-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file trustpilotreviews-0.1.2.tar.gz.

File metadata

  • Download URL: trustpilotreviews-0.1.2.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.16 CPython/3.7.3 Linux/4.18.0-21-generic

File hashes

Hashes for trustpilotreviews-0.1.2.tar.gz
Algorithm Hash digest
SHA256 522449da496d0a556a35a10779581532b3abbada525746584be6db761a1bdaef
MD5 1bfbd8de1611b5e5d256d0fdeec203ee
BLAKE2b-256 f4e8486ba44de22a8348939a387274da71d9067dfaf191499a0d40f83db6967e

See more details on using hashes here.

File details

Details for the file trustpilotreviews-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: trustpilotreviews-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.16 CPython/3.7.3 Linux/4.18.0-21-generic

File hashes

Hashes for trustpilotreviews-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8a831dfc334e1fd2af6473adff32278ddd46366341ce626b637fb665615c7ca6
MD5 c1a851e117fa7e570a1231ef39771aad
BLAKE2b-256 23bb6d158741041a15b22f5ed0e5a4a5087eab23301801ede8852f22aae44615

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page