Skip to main content

A python package to query data via amazon athena and bring it into a pandas df using aws-wrangler.

Project description

pydbtools

A package that is used to run SQL queries speficially configured for the Analytical Platform. This packages uses AWS Wrangler's Athena module but adds additional functionality (like Jinja templating, creating temporary tables) and alters some configuration to our specification.

Installation

Requires a pip release above 20.

## To install from pypi
pip install pydbtools

## Or install from git with a specific release
pip install "pydbtools @ git+https://github.com/moj-analytical-services/pydbtools@v4.0.1"

Quickstart guide

The examples directory contains more detailed notebooks demonstrating the use of this library, many of which are borrowed from the mojap-aws-tools-demo repo.

Read an SQL Athena query into a pandas dataframe

import pydbtools as pydb
df = pydb.read_sql_query("SELECT * from a_database.table LIMIT 10")

Run a query in Athena

response = pydb.start_query_execution_and_wait("CREATE DATABASE IF NOT EXISTS my_test_database")

Create a temporary table to do further separate SQL queries on later

pydb.create_temp_table("SELECT a_col, count(*) as n FROM a_database.table GROUP BY a_col", table_name="temp_table_1")
df = pydb.read_sql_query("SELECT * FROM __temp__.temp_table_1 WHERE n < 10")

pydb.dataframe_to_temp_table(my_dataframe, "my_table")
df = pydb.read_sql_query("select * from __temp__.my_table where year = 2022")

Notes

  • Amazon Athena using a flavour of SQL called presto docs can be found here
  • To query a date column in Athena you need to specify that your value is a date e.g. SELECT * FROM db.table WHERE date_col > date '2018-12-31'
  • To query a datetime or timestamp column in Athena you need to specify that your value is a timestamp e.g. SELECT * FROM db.table WHERE datetime_col > timestamp '2018-12-31 23:59:59'
  • Note dates and datetimes formatting used above. See more specifics around date and datetimes here
  • To specify a string in the sql query always use '' not "". Using ""'s means that you are referencing a database, table or col, etc.

See changelog for release changes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydbtools-5.5.16.tar.gz (11.5 kB view details)

Uploaded Source

Built Distribution

pydbtools-5.5.16-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file pydbtools-5.5.16.tar.gz.

File metadata

  • Download URL: pydbtools-5.5.16.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for pydbtools-5.5.16.tar.gz
Algorithm Hash digest
SHA256 aa491dbb709bfdfb3695d77a68285fd8fb3e67c30e1f3f3b36092817c3cc65f2
MD5 c435b6a742442e6ae3ab9387468fcae3
BLAKE2b-256 e9adb603d0da29dacc80cc0c4b0051092c95b9f85e0fc8b57b58fa75c8307592

See more details on using hashes here.

File details

Details for the file pydbtools-5.5.16-py3-none-any.whl.

File metadata

  • Download URL: pydbtools-5.5.16-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for pydbtools-5.5.16-py3-none-any.whl
Algorithm Hash digest
SHA256 0bc5b60c9dc09cde29d5841d5b46e90937c2a86be5aab6f77116f4ebe1e36e4d
MD5 a5259b58773f8d0e6b406ee7bb756561
BLAKE2b-256 1a2ef4d57717e194343efc22ccabf1b25e0c5b3f7b97c9728c6a7aecdb3e1682

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page