Ergonomic wrapper for pandas_gbq that simplifies loading BigQuery data into DataFrames

These details have not been verified by PyPI

Project links

Homepage

Project description

bqdf

Usage

Installation

Install latest from the GitHub repository:

$ pip install git+https://github.com/motdam/bqdf.git

or from conda

$ conda install -c motdam bqdf

or from pypi

$ pip install bqdf

Documentation

Documentation can be found hosted on this GitHub repository’s pages. Additionally you can find package manager specific guidelines on conda and pypi respectively.

How to use

This lib provides convenience functions for streamlining the interface of the pandas-gbq library to perform CRUD operations in BigQuery more quickly

import pandas_gbq
import pandas as pd

top_terms_query = """
-- todays top 10 search terms in England
SELECT refresh_date, rank, term, score, percent_gain / 100 as percent_gain, country_name, week
FROM `bigquery-public-data.google_trends.international_top_rising_terms` 
WHERE country_name = 'United Kingdom'
  and refresh_date = current_date - 1
  and region_name = 'England'
order by refresh_date desc, week desc, rank
limit 5
"""

Reading a BigQuery table

df = read(top_terms_query, project_id='bq-sandbox-motdam')
df.head()

Downloading:   0%|          |Downloading: 100%|██████████|
Loaded 5 rows × 7 cols (0.0000 GB) from query in 1.31s
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   refresh_date  5 non-null      datetime64[ns]
 1   rank          5 non-null      Int64         
 2   term          5 non-null      object        
 3   score         5 non-null      Int64         
 4   percent_gain  5 non-null      Float64       
 5   country_name  5 non-null      object        
 6   week          5 non-null      dbdate        
dtypes: Float64(1), Int64(2), datetime64[ns](1), dbdate(1), object(2)
memory usage: 427.0+ bytes
None

	refresh_date	rank	term	score	percent_gain	country_name	week
0	2025-11-24	1	liverpool vs nottm forest	15	86.0	United Kingdom	2025-11-23
1	2025-11-24	2	leeds united vs aston villa	100	63.5	United Kingdom	2025-11-23
2	2025-11-24	3	arsenal vs tottenham	100	62.0	United Kingdom	2025-11-23
3	2025-11-24	4	newcastle vs man city	26	51.0	United Kingdom	2025-11-23
4	2025-11-24	5	chayote	9	35.0	United Kingdom	2025-11-23

To recreate the above with the original library you would need the below boiler plate to inspect the results and convert columns into pandas friendly dtypes.

df = pandas_gbq.read_gbq(top_terms_query, project_id='bq-sandbox-motdam')
df = df.astype({
    'percent_gain':'Float64'
})
df['week'] = pd.to_datetime(df['week'])
df['refresh_date'] = pd.to_datetime(df['refresh_date'])
print(df.info())
df.head()

Downloading:   0%|          |Downloading: 100%|██████████|
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   refresh_date  5 non-null      datetime64[ns]
 1   rank          5 non-null      Int64         
 2   term          5 non-null      object        
 3   score         5 non-null      Int64         
 4   percent_gain  5 non-null      Float64       
 5   country_name  5 non-null      object        
 6   week          5 non-null      datetime64[ns]
dtypes: Float64(1), Int64(2), datetime64[ns](2), object(2)
memory usage: 427.0+ bytes
None

	refresh_date	rank	term	score	percent_gain	country_name	week
0	2025-11-24	1	liverpool vs nottm forest	15	86.0	United Kingdom	2025-11-23
1	2025-11-24	2	leeds united vs aston villa	100	63.5	United Kingdom	2025-11-23
2	2025-11-24	3	arsenal vs tottenham	100	62.0	United Kingdom	2025-11-23
3	2025-11-24	4	newcastle vs man city	26	51.0	United Kingdom	2025-11-23
4	2025-11-24	5	chayote	9	35.0	United Kingdom	2025-11-23

Writing a df to BigQuery

The rest to function is unchanged beyond removing the redundant _gbq suffix. We can write our df back into BigQuery using hte to function.

# Write the dataframe to a temporary table
to(df, 'bq-sandbox-motdam.temporary.top_10_eng_search_terms', if_exists='replace')

  0%|          | 0/1 [00:00<?, ?it/s]100%|██████████| 1/1 [00:00<00:00, 9198.04it/s]

Sent 5 rows × 7 cols (0.0000 GB) to bq-sandbox-motdam.temporary.top_10_eng_search_terms in 3.53s

Executing SQL in BigQuery

The ex fucntion enables non df based CRUD operations within the same api which can be useful for creating feature processing pipelines.

project = 'bq-sandbox-motdam'

def create_top_terms(period, days):
    return f"""
    CREATE OR REPLACE TABLE `{project}.temporary.top_terms_{period}` AS
    WITH ranked AS (
      SELECT region_name, term, COUNT(*) as appearances, AVG(rank) as avg_rank,
        ROW_NUMBER() OVER (PARTITION BY region_name ORDER BY COUNT(*) DESC, AVG(rank)) as rn
      FROM `bigquery-public-data.google_trends.international_top_rising_terms`
      WHERE country_name = 'United Kingdom'
        AND region_name IN ('England', 'Scotland', 'Wales', 'Northern Ireland')
        AND refresh_date BETWEEN CURRENT_DATE() - {days} AND CURRENT_DATE()
        AND rank <= 100
      GROUP BY region_name, term
    )
    SELECT region_name, term as top_term_{period}
    FROM ranked WHERE rn = 1
    """

ex(create_top_terms('today', 1), project_id=project)
ex(create_top_terms('week', 8), project_id=project)
ex(create_top_terms('month', 31), project_id=project)
ex(create_top_terms('year', 366), project_id=project)

final_query = f"""
SELECT t.region_name, t.top_term_today, w.top_term_week, m.top_term_month, y.top_term_year
FROM `{project}.temporary.top_terms_today` as t
JOIN `{project}.temporary.top_terms_week` as w ON t.region_name = w.region_name
JOIN `{project}.temporary.top_terms_month` as m ON t.region_name = m.region_name
JOIN `{project}.temporary.top_terms_year` as y ON t.region_name = y.region_name
ORDER BY t.region_name
"""

read(final_query, project_id=project)

Processed 0.3883 GB, 0 rows affected in 2.21s
Processed 2.9971 GB, 0 rows affected in 2.35s
Processed 11.6400 GB, 0 rows affected in 2.17s
Processed 12.1727 GB, 0 rows affected in 2.54s
Downloading:   0%|          |Downloading: 100%|██████████|
Loaded 4 rows × 5 cols (0.0000 GB) from query in 0.63s
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   region_name     4 non-null      object
 1   top_term_today  4 non-null      object
 2   top_term_week   4 non-null      object
 3   top_term_month  4 non-null      object
 4   top_term_year   4 non-null      object
dtypes: object(5)
memory usage: 292.0+ bytes
None

	region_name	top_term_today	top_term_week	top_term_month	top_term_year
0	England	liverpool vs nottm forest	rugby today	ftse 100	india vs australia
1	Northern Ireland	liverpool vs nottm forest	rugby today	ftse 100	india vs australia
2	Scotland	liverpool vs nottm forest	rugby today	ftse 100	india vs australia
3	Wales	liverpool vs nottm forest	rugby today	ftse 100	india vs australia

British search history in a nutshell: ‘Is it raining?’ followed immediately by ‘Can I afford to move somewhere sunny?’

Developer Guide

If you are new to using nbdev here are some useful pointers to get you started.

Install bqdf in Development mode

# make sure bqdf package is installed in development mode
$ pip install -e .

# make changes under nbs/ directory
# ...

# compile to have changes apply to bqdf
$ nbdev_prepare

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.0.8

Dec 1, 2025

0.0.7

Nov 28, 2025

0.0.6

Nov 28, 2025

0.0.5

Nov 28, 2025

0.0.4

Nov 28, 2025

This version

0.0.3

Nov 25, 2025

0.0.2

Nov 25, 2025

0.0.1

Nov 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bqdf-0.0.3.tar.gz (12.9 kB view details)

Uploaded Nov 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bqdf-0.0.3-py3-none-any.whl (11.5 kB view details)

Uploaded Nov 25, 2025 Python 3

File details

Details for the file bqdf-0.0.3.tar.gz.

File metadata

Download URL: bqdf-0.0.3.tar.gz
Upload date: Nov 25, 2025
Size: 12.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for bqdf-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`3c0ef5c4e40a303d4d68de160711eaded857781c120c6127e6b25d3b551674ab`
MD5	`16d7babe9c61c38e36b9b52c6446cb07`
BLAKE2b-256	`8328bd32698ebcd1f8657636045b4ae8c067cb496bc616cda3be05b6005511a8`

See more details on using hashes here.

File details

Details for the file bqdf-0.0.3-py3-none-any.whl.

File metadata

Download URL: bqdf-0.0.3-py3-none-any.whl
Upload date: Nov 25, 2025
Size: 11.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for bqdf-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d3c6747cf54c447bbd82fa27303026afe1baea61a7ee695656037277357c3cbf`
MD5	`dbfde36b0867f7d826ab5d965988fcd9`
BLAKE2b-256	`b06f4fb39a6dc3a5aa08ff951190d0606ff9cd3a0230a439ccbc1025def329eb`

See more details on using hashes here.

bqdf 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

bqdf

Usage

Installation

Documentation

How to use

Reading a BigQuery table

Writing a df to BigQuery

Executing SQL in BigQuery

Developer Guide

Install bqdf in Development mode

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes