Skip to main content

An implementation of tf.data.Dataset for aws Athena

Project description

Tensorflow Data for AWS Athena

An AWS athena library for tensorflow.data.Dataset. If you don't know tf.data, take a look at documentation and this example.

How to use

Use is almost as simple as another tf.Dataset implementation. You just need to create a dataset using the funciton create_athena_dataset

no (it follows aws authentication chain in boto3).

# imports
from tf_data_athena import create_athena_dataset

# connector parameters
s3_output_location = "s3://my-bucket/my-folder/athena-outputs" # Athena output bucket folder
waiting_interval = 0.1 # Time (in seconds) to wait before asking for query state

# query
query = "select * from my_namespace.my_table"

# create dataset
dataset = create_athena_dataset(query, s3_output_location)

Now, dataset is an instance of tf.data.Dataset containing query results.

Parameters

Then factory funcion create_athena_dataset has the following parameters:

  • query: The query to be ran in athena
  • s3_output_location: An s3 path with write access for the current account where the query results file will be saved
  • waiting_interval: A float number representing the number of seconds between to wait before ask for query status on athena
  • num_parallel_calls: Argument for tf.data.Dataset.map (see documentation) while parsing result rows
  • other named arguments: Any other named argument will be used on tf.data.TextLineDataset constructor, please, see documentation.

AWS Authorization

This library uses boto3 behind the scenes, then, it follows the same authentication/authorization chain. Authorized user or service needs permission to create and execute athena queries and create and read s3 objects in the folder defined by s3_output_location.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tf-data-athena-1.0.1.tar.gz (5.0 kB view hashes)

Uploaded Source

Built Distribution

tf_data_athena-1.0.1-py3-none-any.whl (7.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page