Unoffice TrustPilot API to download reviews scores and contents
Project description
TrustPilotReader
Unofficial TrustPilot reviews collector. For Academic Use Only. READ: TrustPilot Terms of Use
Disclamer:
You, and you alone, are responsible for following TrustPilot terms and using this tool to gather their data. Respect their servers and be thoughtful when gathering large amount of data.
Unmatured Documetation :)
This code implements basic data scraping of TrustPilot [default:Danish] Reviews .
It is a prototype to be used for academic reasons only. TrustPilot offers APIs to gather their data
Get it from PyPI
pip install trustpilotreviews
How to use it:
Import package
from trustpilotreviews import GetReviews
1. Initiat Class
Initiate the class with either (a) passing a dictionary of companies as keys and companies TrustPilot id as items or (b) adding them with dictionary syntax.
e.g.
# way a
id_dict = {'Skat':'470bce96000064000501e32d','DR':'4690598c00006400050003ee'}
d = GetReviews(id_dict)
# ids dictionary can be loaded from text files e.g.
lines = np.genfromtxt('companies_ids.csv', delimiter=',',
dtype=str,skip_header=1) #skipped header
csv_dict = {key:item for key, item in lines}
d = GetReviews(csv_dict)
# way b
d = GetReviews()
d['Skat'] = '470bce96000064000501e32d'
No business ids, no problem:
from trustpilotreviews import GetReviews
t = GetReviews()
mate_id = t.get_id('www.mate.bike')
if mate_id.ok:
print(mate.business_id)
data = t.get_reviews()
Having multiple websites, well, no problem:
from trustpilotreviews import GetReviews
t = GetReviews()
ids = t.get_ids(['www.ford.dk','www.mate.bike'])
print(ids) # same as print(t) as ids are added to que
data = t.get_reviews() # mine data for those ids
Want to save it on a database instead of Pandas, done:
from trustpilotreviews import GetReviews
t = GetReviews()
ids = t.get_ids(['www.ford.dk','www.mate.bike'])
# mine data for those ids
t.get_reviews()
# send them to in memory database
t.send_db('../data/','reviews')
2. Reading Data
df = pd.DataFrame(t.dictData)
or from stored source
df = pd.read_pickle('TrustPilotData.pkl')
A full example:
import numpy as np
from trustpilotreviews import GetReviews
# Dictionary from Data
lines = np.genfromtxt('companies_ids.csv', delimiter=',',
dtype=str, skip_header=1)
csv_dict = {key: item for key, item in lines}
d = GetReviews(csv_dict) # Select no for Norwegian Reviews
d.gather_data()
d.save_data(file_name='NoTrustPilotData')
TODOs:
- Allow different saving formats e.g. df.to_XXX
- Split page_review funciton into connection and data parsing (better way to handler bad requests)
- Add more features
- Write a better documetation
TrustPilot Terms of Use
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for trustpilotreviews-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 69e30fce8169bf92ef916bbdd1435409fb6c42f748f3187681cdcd5c2c78e576 |
|
MD5 | 577d020e4230d78e3c68370b06bb7ac4 |
|
BLAKE2b-256 | d98066c8dbbef0662b64ddcee8ba8990806cddc57a7348c850d2981e8f3de9dd |