Unoffice TrustPilot API to download reviews scores and contents
Project description
TrustPilotReader
Unofficial TrustPilot reviews collector. For Academic Use Only. READ: TrustPilot Terms of Use
Disclamer:
You, and you alone, are responsible for following TrustPilot terms and using this tool to gather their data. Respect their servers and be thoughtful when gathering large amount of data.
Unmatured Documetation :)
This code implements basic data scraping of TrustPilot [default:Danish] Reviews .
It is a prototype to be used for academic reasons only. TrustPilot offers APIs to gather their data
Get it from PyPI
pip install trustpilotreviews
How to use it:
Import package
from trustpilotreviews import GetReviews
1. Initiat Class
Initiate the class with either (a) passing a dictionary of companies as keys and companies TrustPilot id as items or (b) adding them with dictionary syntax.
e.g.
# way a: Using dictionary with business ids
id_dict = {'Skat':'470bce96000064000501e32d','DR':'4690598c00006400050003ee'}
d = GetReviews(id_dict)
# ids dictionary can be loaded from text files e.g.
lines = np.genfromtxt('data/business_ids.csv', delimiter=',',
dtype=str,skip_header=1) #skipped header
csv_dict = {key:item for key, item in lines}
d = GetReviews(csv_dict)
# way b: Using dictionary assignment
d = GetReviews()
d['Skat'] = '470bce96000064000501e32d'
No business ids, no problem:
from trustpilotreviews import GetReviews
# Initiate it. Language will be required
t = GetReviews()
# Pass in web-page address as it appears in trustpilot.com
mate_id = t.get_id('www.mate.bike')
# Check if everything is ok
if mate_id.ok:
print(mate.business_id)
# Gather data from that id
data = t.get_reviews()
Having multiple websites, well, no problem:
from trustpilotreviews import GetReviews
t = GetReviews()
# pass multiple web-pages as a list
ids = t.get_ids(['www.ford.dk','www.mate.bike'])
print(ids) # same as print(t) as ids are added to que
# gather data for those ids
data = t.get_reviews()
Want to save it on a database instead of Pandas, done:
from trustpilotreviews import GetReviews
t = GetReviews()
ids = t.get_ids(['www.ford.dk','www.mate.bike'])
# mine data for those ids
t.get_reviews()
# send them to in memory database
t.send_db('../data/','reviews')
2. Reading Data
df = pd.DataFrame(t.dictData)
or from stored source
df = pd.read_pickle('TrustPilotData.pkl')
A full example:
import numpy as np
from trustpilotreviews import GetReviews
# Dictionary from Data
lines = np.genfromtxt('companies_ids.csv', delimiter=',',
dtype=str, skip_header=1)
csv_dict = {key: item for key, item in lines}
d = GetReviews(csv_dict) # Select no for Norwegian Reviews
d.gather_data()
# Saves as pandas dataframe pickle
d.save_data(file_name='NoTrustPilotData')
TODOs:
- Allow different saving formats e.g. df.to_XXX
- Split page_review funciton into connection and data parsing (better way to handler bad requests)
- Add more features
- Write a better documetation
TrustPilot Terms of Use
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file trustpilotreviews-0.1.2.tar.gz
.
File metadata
- Download URL: trustpilotreviews-0.1.2.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.16 CPython/3.7.3 Linux/4.18.0-21-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 522449da496d0a556a35a10779581532b3abbada525746584be6db761a1bdaef |
|
MD5 | 1bfbd8de1611b5e5d256d0fdeec203ee |
|
BLAKE2b-256 | f4e8486ba44de22a8348939a387274da71d9067dfaf191499a0d40f83db6967e |
File details
Details for the file trustpilotreviews-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: trustpilotreviews-0.1.2-py3-none-any.whl
- Upload date:
- Size: 8.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.16 CPython/3.7.3 Linux/4.18.0-21-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a831dfc334e1fd2af6473adff32278ddd46366341ce626b637fb665615c7ca6 |
|
MD5 | c1a851e117fa7e570a1231ef39771aad |
|
BLAKE2b-256 | 23bb6d158741041a15b22f5ed0e5a4a5087eab23301801ede8852f22aae44615 |