Skip to main content

A python package to query data via amazon athena and bring it into a pandas df using aws-wrangler.

Project description

pydbtools

A package that is used to run SQL queries speficially configured for the Analytical Platform. This packages uses AWS Wrangler's Athena module but adds additional functionality (like Jinja templating, creating temporary tables) and alters some configuration to our specification.

Installation

Requires a pip release above 20.

## To install from pypi
pip install pydbtools

## Or install from git with a specific release
pip install "pydbtools @ git+https://github.com/moj-analytical-services/pydbtools@v4.0.1"

Quickstart guide

The examples directory contains more detailed notebooks demonstrating the use of this library, many of which are borrowed from the mojap-aws-tools-demo repo.

Read an SQL Athena query into a pandas dataframe

import pydbtools as pydb
df = pydb.read_sql_query("SELECT * from a_database.table LIMIT 10")

Run a query in Athena

response = pydb.start_query_execution_and_wait("CREATE DATABASE IF NOT EXISTS my_test_database")

Create a temporary table to do further separate SQL queries on later

pydb.create_temp_table("SELECT a_col, count(*) as n FROM a_database.table GROUP BY a_col", table_name="temp_table_1")
df = pydb.read_sql_query("SELECT * FROM __temp__.temp_table_1 WHERE n < 10")

pydb.dataframe_to_temp_table(my_dataframe, "my_table")
df = pydb.read_sql_query("select * from __temp__.my_table where year = 2022")

Notes

  • Amazon Athena using a flavour of SQL called trino. Docs can be found here
  • To query a date column in Athena you need to specify that your value is a date e.g. SELECT * FROM db.table WHERE date_col > date '2018-12-31'
  • To query a datetime or timestamp column in Athena you need to specify that your value is a timestamp e.g. SELECT * FROM db.table WHERE datetime_col > timestamp '2018-12-31 23:59:59'
  • Note dates and datetimes formatting used above. See more specifics around date and datetimes here
  • To specify a string in the sql query always use '' not "". Using ""'s means that you are referencing a database, table or col, etc.
  • If you are working in an environment where you cannot change the default AWS region environment variables you can set AWS_ATHENA_QUERY_REGION which will override these.
  • You can override the bucket where query results are outputted to with the ATHENA_QUERY_DUMP_BUCKET environment variable. This is mandatory if you set the region to something other than eu-west-1.

See changelog for release changes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydbtools-5.8.1.tar.gz (14.7 kB view details)

Uploaded Source

Built Distribution

pydbtools-5.8.1-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file pydbtools-5.8.1.tar.gz.

File metadata

  • Download URL: pydbtools-5.8.1.tar.gz
  • Upload date:
  • Size: 14.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.3

File hashes

Hashes for pydbtools-5.8.1.tar.gz
Algorithm Hash digest
SHA256 da1fcf5e4d42f7a58738d97f6bd946e394f3ca2a4986398459ed7ad2fdd1f5cd
MD5 c8336e0a8ff21a2fb907fde7a573e4ed
BLAKE2b-256 fe98bad2cbcf33b932a1e7d2686c297de16c351fa43dc6e700be6b162a4129b2

See more details on using hashes here.

File details

Details for the file pydbtools-5.8.1-py3-none-any.whl.

File metadata

  • Download URL: pydbtools-5.8.1-py3-none-any.whl
  • Upload date:
  • Size: 12.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.3

File hashes

Hashes for pydbtools-5.8.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c9ea8970a4b9f52b6444bc79947bd24a55db1f320b605222bd2c2a6f14c0071e
MD5 e88546be886e2bcdf0394dffe54849b4
BLAKE2b-256 170801535e0cfe7898e39845f7d851fa86a74c279c74f4d315677ef8ba469df7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page