Simple API wrapper for Google Products
Project description
googlewrapper
General Connector Classes for Google Products
Current Wrappers Available
- Google Analytics
- Google Search Console
- Google Calendar
- Google Big Query
- Google PageSpeed API
- Google Sheets
Wrappers In the Pipeline
STEPS
- Acquire Google Credentials from API Console
- Install this package
- Create Connection in Python
- Use wrapper to make API calls
Acquire Google Credentials from API Console
First we will need to get our own Google Project set up so we can get our credentials. If you don't have experience, you can do so here Google API Console
After you have your project set up, oAuth configured, and the optional service account (only for Google Big Query connections), you are good to install this package.
Make sure to download your oAuth credentials and save them to your working directory as 'client_secret.json'.
Installation
It is recommended to create a virtualenv using the virtualenv library. Once created, I recommend installing googlewrapper with the following command:
pip install googlewrapper
OR
python -m pip install googlewrapper
Establish your Connection
Option 1 - Assign Connection Variable
Use the Connection class found in connect.py to assign credentials. It is recomended to do this, if you want to use credentials to authenticate with other libraries.
'client_secret.json' should be in the working directory, if not, please declare the path while initializing the class. See the example below for both versions.
from googlewrapper.connect import Connection
# in working directory
google_connection = Connection()
# declare path in class
google_connection = Connection("file/path/to/client_secret.json")
Once we have our connection object, we will need to declare the scope of our access. You can do this by accessing the following class methods:
Google Service | Module | Authentication Type | Credential File |
---|---|---|---|
Analytics | .ga() | oAuth | client_secret.json |
Search Console | .gsc() | oAuth | client_secret.json |
Calendar | .cal() | oAuth | client_secret.json |
Big Query | .gbq() | Service Account | gbq-sa.json |
PageSpeed | n/a | API Key | n/a |
Gmail | .gmail() | oAuth | client_secret.json |
Sheets | .gs() | oAuth | client_secret.json |
Note, you can change the file path for authenticating Google Big Query by passing in the Service Account json in the gbq method
gbq_connection = Connection().gbq("file/path/to/service_account.json")
Option 2 - Default Connection (One Line Connect)
It is possible to just use one line when connecting. It is recommended to do this if you will not need your authentication object, and will just be using the wrapper class.
This can be done by initializing the wrapper classes, without any arguments. By default, each class will authenticate with the default method found in the connect class.
IMPORTANT: To do this, we must have 'client_secret.json' in our working directory. -- for GBQ your 'gbq-sa.json' must be in the working directory
See below
from googlewrapper.gsc import GoogleSearchConsole
gsc = GoogleSearchConsole()
After authentication has taken place (via either option), a folder will be created in your cwd named credentials. The respective authentication scopes will be stored there so you don't have to authenticate every time. Each token is stored with the Google property name as a .dat file.
Product Specific Methods
Now that we have our credential object, we are ready to call the api. We will walk through examples for the different products
Google Analytics
Initialize
from googlewrapper.ga import GoogleAnalytics
ga = GoogleAnalytics()
Methods
Assigning Metrics & Dimensions
.set_metrics()
.set_dimensions()
For each of these methods, you will pass in a list of strings. Metric and Dimension names can be found on Google Analytic's Dev Tools site. In these methods, DO NOT include the "ga:" before the metrics or dimensions. The class assigns these automatically at run-time.
Filtering Metrics & Dimensions
.set_metric_filters()
.set_dimension_filters()
These filter methods accept a list of tuples. Each tuple is formated as follows:
# DIMENSION FILTER TUPLE
(dimension name, "not", operator, expression, caseSensitive)
# METRIC FILTER TUPLE
(metric name, "not", operator, comparisonValue)
Filter Name | Description | Example |
---|---|---|
name | Which GA metric/dimension we want to filter on | "channelGrouping" |
"not" | If we want to exclude these named values; False = Include, True = Exclue | False |
operator (dimensions) | operator to compare dimension to; possible values below | "EXACT" |
expression (dimensions) | Strings or regular expression to match/compare against | "Organic Search" |
caseSenstitive (dimensions) | If dimension filters are case senstitive | True |
operator (metrics) | operator to compare metric to; possible values below | "GREATER THAN" |
comparisonValue (metrics) | Strings or regular expression to match/compare against | "100" |
Dimension Operator Possible Values
["REGEXP","BEGINS_WITH","ENDS_WITH","PARTIAL","EXACT","NUMERIC_EQUAL","NUMERIC_GREATER_THAN", "NUMBER_LESS_THAN","IN_LIST"]
Metric Operator Possible Values
["EQUAL", "LESS_THAN","GREATER_THAN","IS_MISSION"]
Example Dimension Filter List
dim_filter = [('channelGrouping',False,"EXACT","Organic Search",True),('landingPage',False,"BEGINS_WITH","/blog/")]
ga.set_dimension_filters(dim_filter)
ga.set_dimension_filter_group("AND")
This will filter Organic Search to the Blog
Example Metric Filter List
metric_filter = [('pageviews',False,"GREATER_THAN","1000")]
ga.set_metric_filter(metric_filter)
This will filter to only show pageviews > 1000
Seting Filter Group Opperation (OR vs AND)
.set_metric_filter_group()
.set_dimension_filter_group()
These methods accept a string ("OR" or "AND"). This sets the logical operator to "OR" or "AND". Default is "OR". This does not matter, if you only have less than 2 filter tuples in your filter list; however, it becomes very important one you have 2 or more filters applied.
Setting Date Range
.set_start_date()
.set_end_date()
These accept a datetime date object (year, month, date). These are inclusive values.
Pulling Data
.build_request()
Once you have prepared your GA object with dimensions,metrics, filters, and date ranges, you can call this method to get your data. It will retrn a pd.DataFrame, but that can be changed by initializing your GA object with the attribute default_view = "dict"
Examples
# Initialize
from googlewrapper.ga import GoogleAnalytics
ga = GoogleAnalytics()
# Assign Metrics
ga_metrics = ['pageviews','sessions']
ga.set_metrics(ga_metrics)
# Assign Dimensions
ga_dims = ['channelGrouping']
ga.set_dimensions(ga_dims)
# Set a Filter (organic only)
organic_filter = [('channelGrouping',False,"EXACT","Organic Search",True)]
ga.set_dimension_filters(organic_filter)
# Assign Start/End Dates
ga.set_start_date(self.start_date)
ga.set_end_date(self.end_date)
# Pull the data
ga_data = ga.build_request()
Google Search Console
Domain properties need to have 'sc-domain:' prefixed to the front of the url to pull the domain property.
Initialize
from googlewrapper.gsc import GoogleSearchConsole
gsc = GoogleSearchConsole()
Methods
Assigning Dates
The following all accept a datetime variable. There are 2 options for setting the date.
Option 1 - assign both start and end dates
Use this option if you would like a range of dates
You need to call both these methods to assign start and end dates
.set_start_date(start_date)
.set_end_date(end_date)
Option 2 - assign one date
Use this option if you only want to see one day worth of data
You only need to call the method below to make it work
.set_date(date)
Other GSC API Parameters
.set_filters()
- sets the dimension filters, format them as a list of dictionaries. See Google's Docs
- metric filters are not assigned prior to the pull, filter data after the api pull
.set_dimensions()
- Options:
- "page"
- "query"
- "date"
- "country"
- "device"
- "searchAppearance"
- Default values: ["page","date"]
.set_sites(sites_list)
- Assigns the list of sites we want to pull
Pulling Data
.all_sites(site_filter)
- Optional param
- site_filter: type: list of strings
- will filter your sites to sites including the strings in the list
- Returns: list of all verified sites in the GSC profile
.get_data()
- After assigning all the parameters, run this method to make the api request
- Assigns the value gsc.output (required for .ctr())
- Returns: dictionary
- Keys: Site URLs from the site_list
- Values: pd.DataFrame of GSC data
.ctr()
- Calculates custom Click Through Rates based our our GSC data we pulled
- For accurate results, make sure that you:
- have "query" in the dimension lis
- have set branded queries using .set_branded()
- Returns: dictionary
- Keys: ["all","branded","non-branded"]
- Values: pd.DataFrame with index as Position and columns ["Clicks","Impressions","CTR"]
Examples
Pull one day's worth of data
# initialize our class
from googlewrapper import GoogleSearchConsole
gsc = GoogleSearchConsole()
# assign variables to our GSC object
gsc.set_sites(sites_list)
gsc.set_date(dt.date.today())
gsc.set_dimensions(dim_list)
# call the api to get the data
data = gsc.get_data()
Find Custom CTR for last 12 months
# initialize our class
from googlewrapper import GoogleSearchConsole
gsc = GoogleSearchConsole()
#declare all the parameters
gsc.set_start_date(dt.date.today()-dt.timedelta(days=365))
gsc.set_end_date(dt.date.today())
gsc.set_dimensions(['query'])
gsc.set_branded(branded_dict)
gsc.set_sites(sites_list)
data = gsc.get_data()
ctr = gsc.ctr()
Google Calendar
Initialize
from googlewrapper.cal import GoogleCalendar
cal = GoogleCalendar()
Methods
.set_default(cal_id)
- Assigns which calendar will be used to create and find events
.find_event(str)
- Searches for an event in your calendar by name
- Returns events list response object
.get_event(eventId)
Returns the event object of the eventId you passed in
.all_events(num_events,min_date)
- Params
- num_events
- Is the number of events you'd like to return
- defaults to 250
- min_date
- the starting point will only search forward in time from this date
- defaults to the current date and time
- num_events
- Returns
.update_event(new_event,send_updates)
Params
- new_event
- event formatted json to update
- send_updates: str
- if you want to send the updates
- see Google's Docs for more info
- defaults to 'all'
Returns
Examples
my_event = cal.find_event('Team Meeting')
Google Big Query
Initialize
from googlewrapper.gbq import GoogleBigQuery
gbq = GoogleBigQuery()
Remember that Google Big Query authenticates from a service account, be sure to include the gbq-sa.json file in your path, or pass in the pass to the .gbq() method.
Methods
Examples
Google PageSpeed Insights
Initialize
from googlewrapper.pagespeed import PageSpeed
page_speed = PageSpeed(API_KEY)
Methods
Examples
Gmail
Initialize
from googlewrapper.gmail import Gmail
mail = Gmail()
Methods
Examples
Google Sheets
To initialize, you need to pass in the URL of the sheet you wish to use. By default the first sheet is set to the active sheet, but that can be changed by calling
.set_sheet(new_sheet_name)
Initialize
from googlewrapper.gs import GoogleSheets
gs = GoogleSheets(YOUR_URL_HERE)
Methods
.set_sheet(sheet_name)
Assign which sheet (tab) we will be pulling from. This must be called to make the other methods work properly.
Params
- sheet_name
- the name of the tabe we want to pull from
.df(start='a1',index=1)
pulls the google sheet and returns it as a pd.DF
Params
- start (optional)
- where we want the top left (including headers) of our table to start
- defaults to 'a1'
- index (optional)
- which column will be the index when pulled
- defaults to the first column
.save(df,start = 'a1',index=True,header=True,extend=True))
Saves a pd.DF to our assigned GoogleSheet
Params
- df
- pd.DF of data we want to save
- start (optional)
- type: string
- where we want to start saving the data
- default: 'a1'
- index (option)
- type: bool
- include the pd.DF index in the sheet
- default: True
- header (optional)
- type: bool
- include the headers in the sheet?
- default: True
- extend (optional)
- type: bool
- if rows/columns would be cut off, create those rows/columns and place the data
- default: True
.clear(start,end)
Removes any data from the selected range of cells. Does not remove formatting
Params
- start
- type: string formatted as cell reference
- top left cell to start clearing
- end
- type: string formatted as cell reference
- bottom right cell to end clearing
.row(row_number)
Returns an entire row of data.
Params
- row_number
- type: int
- which row number you want to pull
.col(col)
Returns an entire column of data.
Params
- col_number
- type: int
- which column number you want to pull
.add_sheet(sheet,data=None)
Creates a new sheet/tab in your spreadsheet. Assigns that sheet as the active sheet - self.set_sheet(). If you pass in data (a pd.DF), it will also write that data starting in 'a1' of your new tab.
Params
- sheet
- type: string
- the name of the new tab to be created
- data
- type: pd.DF
- data that you want filled in to 'a1' on the new tab
.delete_sheet(sheet)
Deletes the sheet/tab whose name you pass in. Params
- sheet
- type: string
- which tab you would like to delete
.share(email_list,role = 'reader')
Shares your spreadsheet with all emails in the list. Can assign different permissions.
Params
- email_list
- type: list
- all the emails you'd like to share this with
- role
- type: string
- which permissions to grant this email list
- options: 'reader','commenter','editor'
Examples
Combining Products Examples
Example 1
Take a list of URLs from Sheets, grab Search Console Data, and import it into Big Query.
from googlewrapper.sheets import GoogleSheets
from googlewrapper.gsc import GoogleSearchConsole
from googlewrapper.gbq import GoogleBigQuery
import datetime as dt
# init our objects
sheets = GoogleSheets(YOUR_URL_HERE)
gsc = GoogleSearchConsole()
gbq = GoogleBigQuery()
# get our urls we want to pull
# remember that sheet1 is default
sites = sheets.get_column(1)
'''
this one is a bit more technical
we can pull our column Branded Words right
from sheets then assign it to a dictionary to use
in our GSC object.
Make sure that your url column is the index for
your df. This will happen by default if the urls
are in the first column in google sheets
'''
branded_list = sheets.df()['Branded Words'].to_dict()
# assign those sheets to GSC
gsc.set_sites(sites)
# assign other GSC variables
gsc.set_date(dt.date(2021,1,1))
gsc.set_dims(['page','date','query'])
# get our data
gsc_data = gsc.get_data()
# print the total clicks/impressions and avg position
# for all the sites we just pulled data for
# send them to Big Query
for site in gsc_data:
print(f"{site}'s Data\n"\
f"Clicks: {gsc_data[site]['Clicks'].sum()}\n"\
f"Impressions: {gsc_data[site]['Impressions'].sum()}\n"\
f"Avg Position: {gsc_data[site]['Position'].mean()}\n\n")
# now we will send our data into our GBQ tables for storage
# we will assign the dataset name to be our url
# we will assign table to be gsc
gbq.set_dataset(site)
gbq.set_table('gsc')
# send the data to GBQ
gbq.send(gsc_data[site])
Thanks for using my code
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for googlewrapper-0.1.12-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce95d77156c7314aad4fbc2767f9ffd614f17029c9fd340a8741cfef69150fce |
|
MD5 | 34b09cff4c4331f72ba6432152828392 |
|
BLAKE2b-256 | 31a915d8a853cb5f4cc5fd53dd0cf94321b8d02e058ec55661be3ea1191bcdea |