Pseudo API for Google Trends with asyncio support.
Project description
pytrends-async
Introduction
A fork of pytrends with full async/await and retry support.
Table of contents
Installation
pip install pytrends-async
Requirements
- Written for python 3.6+
- Requires httpx==0.9.3, lxml, Pandas
API
Connect to Google
from pytrendsasync.request import TrendReq
pytrends = TrendReq(hl='en-US', tz=360)
or if you want to use proxies as you are blocked due to Google rate limit:
from pytrendsasync.request import TrendReq
pytrends = TrendReq(hl='en-US', tz=360, timeout=10, proxies=['https://34.203.233.13:80',])
-
timeout(connect, read)
- Timezone Offset
- For example US CST is
'360'
-
proxies
- https proxies Google passed ONLY
- list
['https://34.203.233.13:80','https://35.201.123.31:880', ..., ...]
-
retries
- number of retries total/connect/read all represented by one scalar
-
backoff_factor
- A backoff factor to apply between attempts after the second try (most errors are resolved immediately by a second try without a delay). tenacity will sleep for:
{backoff factor} * (2 ^ ({number of total retries} - 1))
seconds. If the backoff_factor is 0.1, then sleep() will sleep for [0.0s, 0.2s, 0.4s, …] between retries. By default, backoff is disabled (set to 0).
- A backoff factor to apply between attempts after the second try (most errors are resolved immediately by a second try without a delay). tenacity will sleep for:
Note: the parameter hl
specifies host language for accessing Google Trends.
Note: only https proxies will work, and you need to add the port number after the proxy ip address
Build Payload
kw_list = ["Blockchain"]
pytrends.build_payload(kw_list, cat=0, timeframe='today 5-y', geo='', gprop='')
Parameters
-
kw_list
- Required
- Keywords to get data for
API Methods
The following API methods are available:
-
Interest Over Time: returns historical, indexed data for when the keyword was searched most as shown on Google Trends' Interest Over Time section.
-
Historical Hourly Interest: returns historical, indexed, hourly data for when the keyword was searched most as shown on Google Trends' Interest Over Time section. It sends multiple requests to Google, each retrieving one week of hourly data. It seems like this would be the only way to get historical, hourly data.
-
Interest by Region: returns data for where the keyword is most searched as shown on Google Trends' Interest by Region section.
-
Related Topics: returns data for the related keywords to a provided keyword shown on Google Trends' Related Topics section.
-
Related Queries: returns data for the related keywords to a provided keyword shown on Google Trends' Related Queries section.
-
Trending Searches: returns data for latest trending searches shown on Google Trends' Trending Searches section.
-
Top Charts: returns the data for a given topic shown in Google Trends' Top Charts section.
-
Suggestions: returns a list of additional suggested keywords that can be used to refine a trend search.
Common API parameters
Many API methods use the following:
-
kw_list
-
keywords to get data for
-
Example
['Pizza']
-
Up to five terms in a list:
['Pizza', 'Italian', 'Spaghetti', 'Breadsticks', 'Sausage']
-
Advanced Keywords
- When using Google Trends dashboard Google may provide suggested narrowed search terms.
- For example
"iron"
will have a drop down of"Iron Chemical Element, Iron Cross, Iron Man, etc"
. - Find the encoded topic by using the get_suggestions() function and choose the most relevant one for you.
- For example:
https://www.google.com/trends/explore#q=%2Fm%2F025rw19&cmpt=q
"%2Fm%2F025rw19"
is the topic "Iron Chemical Element" to use this with pytrends- You can also use
pytrends.suggestions()
to automate this.
-
-
-
cat
- Category to narrow results
- Find available cateogies by inspecting the url when manually using Google Trends. The category starts after
cat=
and ends before the next&
or view this wiki page containing all available categories - For example:
"https://www.google.com/trends/explore#q=pizza&cat=71"
'71'
is the category- Defaults to no category
-
geo
- Two letter country abbreviation
- For example United States is
'US'
- Defaults to World
- More detail available for States/Provinces by specifying additonal abbreviations
- For example: Alabama would be
'US-AL'
- For example: England would be
'GB-ENG'
-
tz
- Timezone Offset (in minutes)
- For more information of Timezone Offset, view this wiki page containing about UCT offset
- For example US CST is
'360'
-
timeframe
-
Date to start from
-
Defaults to last 5yrs,
'today 5-y'
. -
Everything
'all'
-
Specific dates, 'YYYY-MM-DD YYYY-MM-DD' example
'2016-12-14 2017-01-25'
-
Specific datetimes, 'YYYY-MM-DDTHH YYYY-MM-DDTHH' example
'2017-02-06T10 2017-02-12T07'
- Note Time component is based off UTC
-
Current Time Minus Time Pattern:
-
By Month:
'today #-m'
where # is the number of months from that date to pull data for- For example:
'today 3-m'
would get data from today to 3months ago - NOTE Google uses UTC date as 'today'
- Seems to only work for 1, 2, 3 months only
- For example:
-
Daily:
'now #-d'
where # is the number of days from that date to pull data for- For example:
'now 7-d'
would get data from the last week - Seems to only work for 1, 7 days only
- For example:
-
Hourly:
'now #-H'
where # is the number of hours from that date to pull data for- For example:
'now 1-H'
would get data from the last hour - Seems to only work for 1, 4 hours only
- For example:
-
-
-
gprop
- What Google property to filter to
- Example
'images'
- Defaults to web searches
- Can be
images
,news
,youtube
orfroogle
(for Google Shopping results)
Interest Over Time
await pytrends.interest_over_time()
Returns pandas.Dataframe
Historical Hourly Interest
await pytrends.get_historical_interest(kw_list, year_start=2018, month_start=1, day_start=1, hour_start=0, year_end=2018, month_end=2, day_end=1, hour_end=0, cat=0, geo='', gprop='', sleep=0)
Parameters
-
kw_list
- Required
- list of keywords that you would like the historical data
-
year_start, month_start, day_start, hour_start, year_end, month_end, day_end, hour_end
- the time period for which you would like the historical data
-
sleep
- If you are rate-limited by Google, you should set this parameter to something (i.e. 60) to space off each API call.
Returns pandas.Dataframe
Interest by Region
await pytrends.interest_by_region(resolution='COUNTRY', inc_low_vol=True, inc_geo_code=False)
Parameters
-
resolution
- 'CITY' returns city level data
- 'COUNTRY' returns country level data
- 'DMA' returns Metro level data
- 'REGION' returns Region level data
-
inc_low_vol
- True/False (includes google trends data for low volume countries/regions as well)
-
inc_geo_code
- True/False (includes ISO codes of countries along with the names in the data)
Returns pandas.DataFrame
Related Topics
await pytrends.related_topics()
Returns dictionary of pandas.DataFrames
Related Queries
await pytrends.related_queries()
Returns dictionary of pandas.DataFrames
Trending Searches
await pytrends.trending_searches(pn='united_states') # trending searches in real time for United States
await pytrends.trending_searches(pn='japan') # Japan
Returns pandas.DataFrame
Top Charts
await pytrends.top_charts(date, hl='en-US', tz=300, geo='GLOBAL')
Parameters
-
date
- Required
- YYYY or YYYYMM integer
- Example
201611
for November 2016 Top Chart data
Returns pandas.DataFrame
Suggestions
await pytrends.suggestions(keyword)
Parameters
-
keyword
- Required
- keyword to get suggestions for
Returns dictionary
Categories
await pytrends.categories()
Returns dictionary
Caveats
- This is not an official or supported API
- Google may change aggregation level for items with very large or very small search volume
- Rate Limit is not publicly known, let me know if you have a consistent estimate
- One user reports that 1,400 sequential requests of a 4 hours timeframe got them to the limit. (Replicated on 2 networks)
- It has been tested, and 60 seconds of sleep between requests (successful or not) is the correct amount once you reach the limit.
- For certain configurations the dependency lib certifi requires the environment variable REQUESTS_CA_BUNDLE to be explicitly set and exported. This variable must contain the path where the ca-certificates are saved or a SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] error is given at runtime.
Credits
-
Major JSON revision ideas taken from pat310's JavaScript library
-
Connecting to google code heavily based off Stack Overflow post
-
With some ideas pulled from Matt Reid's Google Trends API
ChangeLog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog,
0.3.2 (2019-12-23)
Changed
- Updated underlying HTTPX library to 0.9.5 from 0.9.3
0.3.1 (2019-12-08)
Fixed
- Fixed import of
asyncio.sleep
indailydata.py
0.3.0 (2019-12-08)
Added
- Retry support has been reintroduced (back by tenacity). Retry settings only apply when proxies are not in use.
- Python 3.8 is now offically tested and supported.
Changed
- Reintroduced
retries
andbackoff_factor
toTrendsReq.__init__()
.retries
andbackoff_factor
are disabled by default (set to 0). These parameters will only affect retrying if proxies are not in use. - Proxies that return a 429 (Too Many Requests) will no longer be removed the proxy list. Instead, another proxy (or no proxy if all proxies have been exausted) will be used in the next request.
- Proxies that trigger an error that is not caused by a 429 response code (ConnectionRefusedError, SSLError) will be placed in
TrendReq.blacklisted_proxies
instead of removed from the proxies list. - Underyling httpx library has been updated to version 0.9.3.
Fixed
dailydata.py
now usesasyncio.sleep
instead oftime.sleep
.
0.2.1 (2019-12-04)
Changed
- Fixed importing issue
0.2.0 (2019-12-04)
Added
- This changelog :)
- Proxy support has been introduced but still needs further testing.
Changed
GetNewProxy()
replaced with internal method_iterate_proxy()
- Protocol changed from HTTP/2 to HTTP/1.1. This resolves a KeyError that was occurring with the underlying http2 lib.
- HTTP connections are now properly cleaned up after use.
0.1.0 (2019-12-01)
- Initial release of pytrends-async for testing purposes.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pytrends-async-0.3.2.tar.gz
.
File metadata
- Download URL: pytrends-async-0.3.2.tar.gz
- Upload date:
- Size: 17.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.40.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 47d5e15cd76410c55d2e8a9f85983a9411f5cd8393f3fff3ca6d22e1721964d2 |
|
MD5 | 64b394513d47f27b1c84912cfc529d84 |
|
BLAKE2b-256 | 25db88d1872ac0325c15ab7b63c974b544dbf65a212e66fa698224a78bc3f1f3 |