Ergonomic wrapper for pandas_gbq that simplifies loading BigQuery data into DataFrames
Project description
bqdf
Usage
Installation
Install latest from the GitHub repository:
$ pip install git+https://github.com/motdam/bqdf.git
or from conda
$ conda install -c motdam bqdf
or from pypi
$ pip install bqdf
Documentation
Documentation can be found hosted on this GitHub repository’s pages. Additionally you can find package manager specific guidelines on conda and pypi respectively.
How to use
This lib provides convenience functions for streamlining the interface of the pandas-gbq library to perform CRUD operations in BigQuery more quickly
import pandas_gbq
import pandas as pd
top_terms_query = """
-- todays top 10 search terms in England
SELECT refresh_date, rank, term, score, percent_gain / 100 as percent_gain, country_name, week
FROM `bigquery-public-data.google_trends.international_top_rising_terms`
WHERE country_name = 'United Kingdom'
and refresh_date = current_date - 1
and region_name = 'England'
order by refresh_date desc, week desc, rank
limit 5
"""
Reading a BigQuery table
df = read(top_terms_query, project_id='bq-sandbox-motdam')
df.head()
Downloading: 0%| |Downloading: 100%|██████████|
Loaded 5 rows × 7 cols (0.0000 GB) from query in 1.31s
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 refresh_date 5 non-null datetime64[ns]
1 rank 5 non-null Int64
2 term 5 non-null object
3 score 5 non-null Int64
4 percent_gain 5 non-null Float64
5 country_name 5 non-null object
6 week 5 non-null dbdate
dtypes: Float64(1), Int64(2), datetime64[ns](1), dbdate(1), object(2)
memory usage: 427.0+ bytes
None
| refresh_date | rank | term | score | percent_gain | country_name | week | |
|---|---|---|---|---|---|---|---|
| 0 | 2025-11-24 | 1 | liverpool vs nottm forest | 15 | 86.0 | United Kingdom | 2025-11-23 |
| 1 | 2025-11-24 | 2 | leeds united vs aston villa | 100 | 63.5 | United Kingdom | 2025-11-23 |
| 2 | 2025-11-24 | 3 | arsenal vs tottenham | 100 | 62.0 | United Kingdom | 2025-11-23 |
| 3 | 2025-11-24 | 4 | newcastle vs man city | 26 | 51.0 | United Kingdom | 2025-11-23 |
| 4 | 2025-11-24 | 5 | chayote | 9 | 35.0 | United Kingdom | 2025-11-23 |
To recreate the above with the original library you would need the below boiler plate to inspect the results and convert columns into pandas friendly dtypes.
df = pandas_gbq.read_gbq(top_terms_query, project_id='bq-sandbox-motdam')
df = df.astype({
'percent_gain':'Float64'
})
df['week'] = pd.to_datetime(df['week'])
df['refresh_date'] = pd.to_datetime(df['refresh_date'])
print(df.info())
df.head()
Downloading: 0%| |Downloading: 100%|██████████|
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 refresh_date 5 non-null datetime64[ns]
1 rank 5 non-null Int64
2 term 5 non-null object
3 score 5 non-null Int64
4 percent_gain 5 non-null Float64
5 country_name 5 non-null object
6 week 5 non-null datetime64[ns]
dtypes: Float64(1), Int64(2), datetime64[ns](2), object(2)
memory usage: 427.0+ bytes
None
| refresh_date | rank | term | score | percent_gain | country_name | week | |
|---|---|---|---|---|---|---|---|
| 0 | 2025-11-24 | 1 | liverpool vs nottm forest | 15 | 86.0 | United Kingdom | 2025-11-23 |
| 1 | 2025-11-24 | 2 | leeds united vs aston villa | 100 | 63.5 | United Kingdom | 2025-11-23 |
| 2 | 2025-11-24 | 3 | arsenal vs tottenham | 100 | 62.0 | United Kingdom | 2025-11-23 |
| 3 | 2025-11-24 | 4 | newcastle vs man city | 26 | 51.0 | United Kingdom | 2025-11-23 |
| 4 | 2025-11-24 | 5 | chayote | 9 | 35.0 | United Kingdom | 2025-11-23 |
Writing a df to BigQuery
The rest to function is
unchanged beyond removing the redundant _gbq suffix. We can write our
df back into BigQuery using hte
to function.
# Write the dataframe to a temporary table
to(df, 'bq-sandbox-motdam.temporary.top_10_eng_search_terms', if_exists='replace')
0%| | 0/1 [00:00<?, ?it/s]100%|██████████| 1/1 [00:00<00:00, 9198.04it/s]
Sent 5 rows × 7 cols (0.0000 GB) to bq-sandbox-motdam.temporary.top_10_eng_search_terms in 3.53s
Executing SQL in BigQuery
The ex fucntion enables
non df based CRUD operations within the same api which can be useful for
creating feature processing pipelines.
project = 'bq-sandbox-motdam'
def create_top_terms(period, days):
return f"""
CREATE OR REPLACE TABLE `{project}.temporary.top_terms_{period}` AS
WITH ranked AS (
SELECT region_name, term, COUNT(*) as appearances, AVG(rank) as avg_rank,
ROW_NUMBER() OVER (PARTITION BY region_name ORDER BY COUNT(*) DESC, AVG(rank)) as rn
FROM `bigquery-public-data.google_trends.international_top_rising_terms`
WHERE country_name = 'United Kingdom'
AND region_name IN ('England', 'Scotland', 'Wales', 'Northern Ireland')
AND refresh_date BETWEEN CURRENT_DATE() - {days} AND CURRENT_DATE()
AND rank <= 100
GROUP BY region_name, term
)
SELECT region_name, term as top_term_{period}
FROM ranked WHERE rn = 1
"""
ex(create_top_terms('today', 1), project_id=project)
ex(create_top_terms('week', 8), project_id=project)
ex(create_top_terms('month', 31), project_id=project)
ex(create_top_terms('year', 366), project_id=project)
final_query = f"""
SELECT t.region_name, t.top_term_today, w.top_term_week, m.top_term_month, y.top_term_year
FROM `{project}.temporary.top_terms_today` as t
JOIN `{project}.temporary.top_terms_week` as w ON t.region_name = w.region_name
JOIN `{project}.temporary.top_terms_month` as m ON t.region_name = m.region_name
JOIN `{project}.temporary.top_terms_year` as y ON t.region_name = y.region_name
ORDER BY t.region_name
"""
read(final_query, project_id=project)
Processed 0.3883 GB, 0 rows affected in 2.21s
Processed 2.9971 GB, 0 rows affected in 2.35s
Processed 11.6400 GB, 0 rows affected in 2.17s
Processed 12.1727 GB, 0 rows affected in 2.54s
Downloading: 0%| |Downloading: 100%|██████████|
Loaded 4 rows × 5 cols (0.0000 GB) from query in 0.63s
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 region_name 4 non-null object
1 top_term_today 4 non-null object
2 top_term_week 4 non-null object
3 top_term_month 4 non-null object
4 top_term_year 4 non-null object
dtypes: object(5)
memory usage: 292.0+ bytes
None
| region_name | top_term_today | top_term_week | top_term_month | top_term_year | |
|---|---|---|---|---|---|
| 0 | England | liverpool vs nottm forest | rugby today | ftse 100 | india vs australia |
| 1 | Northern Ireland | liverpool vs nottm forest | rugby today | ftse 100 | india vs australia |
| 2 | Scotland | liverpool vs nottm forest | rugby today | ftse 100 | india vs australia |
| 3 | Wales | liverpool vs nottm forest | rugby today | ftse 100 | india vs australia |
British search history in a nutshell: ‘Is it raining?’ followed immediately by ‘Can I afford to move somewhere sunny?’
Developer Guide
If you are new to using nbdev here are some useful pointers to get you
started.
Install bqdf in Development mode
# make sure bqdf package is installed in development mode
$ pip install -e .
# make changes under nbs/ directory
# ...
# compile to have changes apply to bqdf
$ nbdev_prepare
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bqdf-0.0.1.tar.gz.
File metadata
- Download URL: bqdf-0.0.1.tar.gz
- Upload date:
- Size: 13.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54e9aeb0e73e7b9a1bc8fff36d61fbfd45573f979090005eca6a2489a7594028
|
|
| MD5 |
3ff983f6e2c266aa9a702a879912e94f
|
|
| BLAKE2b-256 |
613d9c2b2260d62a8839341db364786affa3fbeca88e5114be3e166a08422980
|
File details
Details for the file bqdf-0.0.1-py3-none-any.whl.
File metadata
- Download URL: bqdf-0.0.1-py3-none-any.whl
- Upload date:
- Size: 11.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f08b885d504f3ab15d9e8b223c1ff3e0666aae3909aba20da414579f078f499
|
|
| MD5 |
ca432f4ede40c885cfbcacc00a452de3
|
|
| BLAKE2b-256 |
119f0844b8d2c9da9da82f123c478b977eff5ba1b32064e3d29eacb511a9d97d
|