Interface for the Google Trends time-series widget
Project description
.. image:: https://travis-ci.com/APirchner/gsvi.svg?branch=master :target: https://travis-ci.com/APirchner/gsvi
.. image:: https://img.shields.io/pypi/dm/gsvi :target: https://pypi.python.org/pypi/gsvi
GSVI - Google Search Volume Index
A interface for the Google Trends <https://trends.google.com/>
_ time-series widget.
Motivation
The Google Search Volume Index (GSVI) is a useful metric for the attention a certain topic (product, event etc.) receives. Fetching data from Google Trends automatically can be a pain. This package makes getting arbitrary-length time-series of monthly, daily and hourly data as easy as::
connection = GoogleConnection()
series = SVSeries.univariate(connection=connection,
query={'key': 'sp500', 'geo': 'US'},
start=start, end=end,
category=CategoryCodes.FINANCE,
granularity='DAY'
)
ts = series.get_data()
Idea
Google Trends' quota limits are a big factor slowing down the collection of data. Packing requests tightly reduced the load on GT and pushes the limits on data one can pull before getting kicked - this package bundles up to queries into one request. Another weird quirks of Google Trends are the data normalization and granularity. GT reports daily data for queries with an interval length between 1 and 269 days, hourly data between 3 and 7 days and monthly data from 1890 days on (NOTE: this could change at any time!). For each request, the volume within one request is normalized to have a maximal value of 100. So the query for the daily/hourly search volume for keyword abc in some interval, is handled by gsvi as follows:
#. Bundle the query into requests of max. 5 269-day (7-day for hourly) fragments each and get the SV from Google Trends. If this is sufficient to cover the whole interval, skip the next steps. #. Look for the query holding the maximum (100) in each request. #. Bundle the fragments holding the maxima into requests of 5 each and get the SV from Google Trends. #. Repeat 2.-3. until only one interval holding the global maximum is left #. Bundle the original query into requests of 4 269-day (7-day) fragments + fragment with global maximum get the SV from Google Trends.
This procedure results in a continuous series that is normalized to [0, 100] over the global maximum by Google Trends itself. For monthly data, one request per key is sufficient.
This idea naturally extends to multivariate series of up to n queries, limited only by what GT allows you to pull.
Usage
For example, we would like to get the daily search volume for 'food' in the U.S. from 2014-09-13 to 2019-06-13::
import datetime as dt
from gsvi.connection import GoogleConnection
from gsvi.timeseries import SVSeries
# series start and end
start = dt.datetime(year=2014, month=9, day=13)
end = dt.datetime(year=2019, month=6, day=13)
# make connection to Google Trends and inject connection into the request structure
connection = GoogleConnection()
series = SVSeries.univariate(connection=connection,
query={'key': 'food', 'geo': 'US'},
start=start, end=end, granularity='DAY'
)
ts = series.get_data()
Or, what about the monthly volume of car brand searches in Germany from 2004 to June 2019::
import datetime as dt
from gsvi.connection import GoogleConnection
from gsvi.timeseries import SVSeries
# series start and end
start = dt.datetime(year=2014, month=9, day=13)
end = dt.datetime(year=2019, month=6, day=13)
# make connection to Google Trends and inject connection into the request structure
connection = GoogleConnection()
query_multi = [{'key': 'mercedes', 'geo': 'DE'},
{'key': 'porsche', 'geo': 'DE'},
{'key': 'bmw', 'geo': 'DE'},
{'key': 'audi', 'geo': 'DE'},
{'key': 'vw', 'geo': 'DE'},
{'key': 'ford', 'geo': 'DE'},
]
start_multi = dt.datetime(year=2004, month=1, day=1)
end_multi = dt.datetime(year=2019, month=6, day=1)
series = SVSeries.multivariate(connection=connection,
queries=query_multi,
start=start, end=end,
granularity='MONTH'
)
ts = series.get_data()
Installation
::
$ pip install gsvi
Credits
- The main idea for the normalization was taken from a paper by
Christopher Fink and Thomas Johann (2014) <https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2139313>
_ - The connection-handling was inspired by the
pytrends package <https://github.com/GeneralMills/pytrends>
_
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file gsvi-0.2.2.tar.gz
.
File metadata
- Download URL: gsvi-0.2.2.tar.gz
- Upload date:
- Size: 12.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/54.2.0 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e7a95fc471ac119fff9157c6e7ade9dfcd69899817a503ac72539d54877509c |
|
MD5 | dc2c4eb412ffa206a4cfc0459be06029 |
|
BLAKE2b-256 | 98b121f4a26d5f5235799c5196c2fded526130e13ab08a9dbfabbcae7086e1de |