A python package to query data via amazon athena and bring it into a pandas df using aws-wrangler.

Project description

pydbtools

A package that is used to run SQL queries speficially configured for the Analytical Platform. This packages uses AWS Wrangler's Athena module but adds additional functionality (like Jinja templating, creating temporary tables) and alters some configuration to our specification.

Installation

Requires a pip release above 20.

## To install from pypi
pip install pydbtools

## Or install from git with a specific release
pip install "pydbtools @ git+https://github.com/moj-analytical-services/pydbtools@v4.0.1"

Quickstart guide

The examples directory contains more detailed notebooks demonstrating the use of this library, many of which are borrowed from the mojap-aws-tools-demo repo.

Read an SQL Athena query into a pandas dataframe

import pydbtools as pydb
df = pydb.read_sql_query("SELECT * from a_database.table LIMIT 10")

Run a query in Athena

response = pydb.start_query_execution_and_wait("CREATE DATABASE IF NOT EXISTS my_test_database")

Create a temporary table to do further separate SQL queries on later

pydb.create_temp_table("SELECT a_col, count(*) as n FROM a_database.table GROUP BY a_col", table_name="temp_table_1")
df = pydb.read_sql_query("SELECT * FROM __temp__.temp_table_1 WHERE n < 10")

pydb.dataframe_to_temp_table(my_dataframe, "my_table")
df = pydb.read_sql_query("select * from __temp__.my_table where year = 2022")

Notes

Amazon Athena using a flavour of SQL called trino. Docs can be found here
To query a date column in Athena you need to specify that your value is a date e.g. SELECT * FROM db.table WHERE date_col > date '2018-12-31'
To query a datetime or timestamp column in Athena you need to specify that your value is a timestamp e.g. SELECT * FROM db.table WHERE datetime_col > timestamp '2018-12-31 23:59:59'
Note dates and datetimes formatting used above. See more specifics around date and datetimes here
To specify a string in the sql query always use '' not "". Using ""'s means that you are referencing a database, table or col, etc.
If you are working in an environment where you cannot change the default AWS region environment variables you can set AWS_ATHENA_QUERY_REGION which will override these.
You can override the bucket where query results are outputted to with the ATHENA_QUERY_DUMP_BUCKET environment variable. This is mandatory if you set the region to something other than eu-west-1.

See changelog for release changes.

Project details

Release history Release notifications | RSS feed

This version

5.8.1

May 8, 2025

5.8.0

May 7, 2025

5.7.1

May 7, 2025

5.6.4

Sep 27, 2024

5.6.3

Aug 15, 2024

5.6.2

Aug 12, 2024

5.6.1

Aug 12, 2024

5.6.0

Jul 31, 2024

5.5.20

Jul 23, 2024

5.5.19

Jul 18, 2024

5.5.18

May 13, 2024

5.5.17

Apr 18, 2024

5.5.16

Apr 8, 2024

5.5.15

Dec 1, 2023

5.5.14

Dec 1, 2023

5.5.13

Nov 16, 2023

5.5.12

Nov 14, 2023

5.5.9

Oct 3, 2023

5.5.8

Aug 1, 2023

5.5.7

Jul 25, 2023

5.5.6

May 5, 2023

5.5.5

May 2, 2023

5.5.4

Apr 26, 2023

5.5.3

Mar 6, 2023

5.5.2

Feb 24, 2023

5.5.1

Feb 13, 2023

5.5.0

Feb 7, 2023

5.4.0

Jan 26, 2023

5.3.2

Oct 12, 2022

5.3.1

Jul 22, 2022

5.3.0

Jul 13, 2022

5.2.2

Jun 16, 2022

5.2.1

May 11, 2022

5.2.0

Mar 2, 2022

5.1.0

Feb 15, 2022

5.0.0

Jan 20, 2022

4.0.1

Sep 24, 2021

4.0.0

Jul 16, 2021

3.1.1

Jun 25, 2021

3.1.0

Mar 11, 2021

3.0.1

Feb 15, 2021

3.0.0

Jan 22, 2021

2.0.2

Dec 1, 2020

2.0.1

Sep 28, 2020

2.0.0

Sep 2, 2020

1.0.3

Sep 20, 2019

1.0.2

Sep 17, 2019

1.0.1

Jun 18, 2019

1.0.0

Jun 13, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydbtools-5.8.1.tar.gz (14.7 kB view details)

Uploaded May 8, 2025 Source

Built Distribution

pydbtools-5.8.1-py3-none-any.whl (12.7 kB view details)

Uploaded May 8, 2025 Python 3

File details

Details for the file pydbtools-5.8.1.tar.gz.

File metadata

Download URL: pydbtools-5.8.1.tar.gz
Upload date: May 8, 2025
Size: 14.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.3

File hashes

Hashes for pydbtools-5.8.1.tar.gz
Algorithm	Hash digest
SHA256	`da1fcf5e4d42f7a58738d97f6bd946e394f3ca2a4986398459ed7ad2fdd1f5cd`
MD5	`c8336e0a8ff21a2fb907fde7a573e4ed`
BLAKE2b-256	`fe98bad2cbcf33b932a1e7d2686c297de16c351fa43dc6e700be6b162a4129b2`

See more details on using hashes here.

File details

Details for the file pydbtools-5.8.1-py3-none-any.whl.

File metadata

Download URL: pydbtools-5.8.1-py3-none-any.whl
Upload date: May 8, 2025
Size: 12.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.3

File hashes

Hashes for pydbtools-5.8.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c9ea8970a4b9f52b6444bc79947bd24a55db1f320b605222bd2c2a6f14c0071e`
MD5	`e88546be886e2bcdf0394dffe54849b4`
BLAKE2b-256	`170801535e0cfe7898e39845f7d851fa86a74c279c74f4d315677ef8ba469df7`

See more details on using hashes here.

pydbtools 5.8.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

pydbtools

Installation

Quickstart guide

Read an SQL Athena query into a pandas dataframe

Run a query in Athena

Create a temporary table to do further separate SQL queries on later

Notes

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes