Skip to main content

A simple athena wrapper leveraging boto3 to execute queries and return results while only requiring a database and a query string.

Project description

pythena

This is a simple python module that will allow you to query athena the same way the AWS Athena console would. It only requires a database name and query string.

Install

pip install pythena

Setup

Be sure to set up your AWS authentication credentials. You can do so by using the aws cli and running

pip install awscli
aws configure

More help on configuring the aws cli here https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html

Simple Usage

import pythena

athena_client = pythena.Athena("mydatabase") 

# Returns results as a pandas dataframe
df = athena_client.execute("select * from mytable")

print(df.sample(n=2)) # Prints 2 rows from your dataframe

Connect to Database

import boto3
import pythena

# Connect to a database
athena_client = pythena.Athena(database="mydatabase")
# Connect to a database and override default aws region in your aws configuration
athena_client = pythena.Athena(database="mydatabase", region='us-east-1')
# Connect to a database and override default profile in your aws configuration
athena_client = pythena.Athena(database="mydatabase", session=boto3.session.Session())

athena_client.execute()

execute(
  query='SQL_QUERY',                   # Required
  s3_output_url='FULL_S3_PATH',        # Optional (Format example: 's3://mybucket/mydir'
  save_results=TRUE | FALSE            # Optional. Defaults to True only when 's3_output_url' is provided. If True, the s3 results will not be deleted and an tuple is returned with the execution_id.
  run_async=TRUE | FALSE               # Optional. If True, allows you to run the query asynchronously. Returns execution_id, use get_result(execution_id) to fetch it when finished
  workgroup='primary'                  # Optional. Defaults to 'primary' workgroup
)

Note: execute() returns a tuple (dataframe, execution_id) unless run_async=True, then it only returns the execution_id.

Full Usage Examples

import boto3
import pythena

# Prints out all databases listed in the glue catalog
pythena.print_databases()
pythena.print_databases(region='us-east-1') # Overrides default region
pythena.print_databases(session=boto3.session.Session()) # Overrides default profile

# Gets all databases and returns as a list
pythena.get_databases()
pythena.get_databases(region='us-east-1') # Overrides default region
pythena.get_databases(session=boto3.session.Session()) # Overrides default profile

# Connect to a database
athena_client = pythena.Athena(database="mydatabase")
athena_client = pythena.Athena(database="mydatabase", region='us-east-1') # Overrides default region
athena_client = pythena.Athena(database="mydatabase", session=boto3.session.Session()) # Overrides default profile

# Prints out all tables in a database
athena_client.print_tables()

# Gets all tables in the database you are connected to and returns as a list
athena_client.get_tables()

# Execute a query, returns tuple with dataframe and athena execution_id
dataframe, _ = athena_client.execute(query="select * from my_table") # Results are  returned as a dataframe

# Execute a query and save results to s3
dataframe, execution_id = athena_client.execute(query="select * from my_table", s3_output_url="s3://mybucket/mydir") # Results are  returned as a dataframe

# Get Execution Id and save results
dataframe, execution_id = athena_client.execute(query="select * from my_table", save_results=True)

# Get Execution Id and save results
dataframe, execution_id = athena_client.execute(query="select * from my_table", save_results=True)

# Execute a query asynchronously
execution_id = athena_client.execute(query="select * from my_table", run_async=True) # Returns just the execution id
dataframe = athena_client.get_result(execution_id) # Will report errors if query failed or let you know if it is still running

# With asynchronous queries, can check status, get error, or cancel
pythena.get_query_status(execution_id)
pythena.get_query_error(execution_id)
pythena.cancel_query(execution_id)

Note

By default, when executing athena queries, via boto3 or the AWS athena console, the results are saved in an s3 bucket. This module by default, assuming a successful execution, will delete the s3 result file to keep s3 clean. If an s3_output_url is provided, then the results will be saved to that location and will not be deleted.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pythena-1.6.0.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

pythena-1.6.0-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file pythena-1.6.0.tar.gz.

File metadata

  • Download URL: pythena-1.6.0.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.5

File hashes

Hashes for pythena-1.6.0.tar.gz
Algorithm Hash digest
SHA256 007e5a1fcd12e8c9209cce62ec51977a34fc51d2ee430433f00de13254a96321
MD5 cd0d1fd8e1487a46932a43e97e5e816a
BLAKE2b-256 9bc370a1f81672261d101def4cc8e047b9e138f54f576eb55fc3c0c4146294f8

See more details on using hashes here.

File details

Details for the file pythena-1.6.0-py3-none-any.whl.

File metadata

  • Download URL: pythena-1.6.0-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.5

File hashes

Hashes for pythena-1.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fb903e33e7ae8024f035ff5dbef0926985713895dbd5f620e1cb09d0ba86cb40
MD5 577c574dcf6379af9f2cc6924c5d64bc
BLAKE2b-256 fca413aaba4a7ecaa2ba161c6169bc4077b13f78671b2d84ddc41ee4c93dd9c2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page