Skip to main content

An implementation of tf.data.Dataset for aws Athena

Project description

Tensorflow Data for AWS Athena

An AWS athena library for tensorflow.data.Dataset. If you don't know tf.data, take a look at documentation and this example.

How to use

Use is almost as simple as another tf.Dataset implementation. You just need to create a dataset using the funciton create_athena_dataset

no (it follows aws authentication chain in boto3).

# imports
from tf_data_athena import create_athena_dataset

# connector parameters
s3_output_location = "s3://my-bucket/my-folder/athena-outputs" # Athena output bucket folder
waiting_interval = 0.1 # Time (in seconds) to wait before asking for query state

# query
query = "select * from my_namespace.my_table"

# create dataset
dataset = create_athena_dataset(query, s3_output_location)

Now, dataset is an instance of tf.data.Dataset containing query results.

Parameters

Then factory funcion create_athena_dataset has the following parameters:

  • query: The query to be ran in athena
  • s3_output_location: An s3 path with write access for the current account where the query results file will be saved
  • waiting_interval: A float number representing the number of seconds between to wait before ask for query status on athena
  • num_parallel_calls: Argument for tf.data.Dataset.map (see documentation) while parsing result rows
  • other named arguments: Any other named argument will be used on tf.data.TextLineDataset constructor, please, see documentation.

AWS Authorization

This library uses boto3 behind the scenes, then, it follows the same authentication/authorization chain. Authorized user or service needs permission to create and execute athena queries and create and read s3 objects in the folder defined by s3_output_location.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tf-data-athena-1.0.1.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

tf_data_athena-1.0.1-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file tf-data-athena-1.0.1.tar.gz.

File metadata

  • Download URL: tf-data-athena-1.0.1.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.3

File hashes

Hashes for tf-data-athena-1.0.1.tar.gz
Algorithm Hash digest
SHA256 500e09993a437fcd81486253380745e80953a14c7228431820dfc558c4a088e9
MD5 de3f881446156b10700ce690e5d8d771
BLAKE2b-256 c30075f619656927b6c0e9b8008f2a43b73b7e6e6b198e1052320f6d31381bf5

See more details on using hashes here.

File details

Details for the file tf_data_athena-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: tf_data_athena-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.3

File hashes

Hashes for tf_data_athena-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b3908cdb8b9a8e39c30f726e1fc618a9b14db85ec700b20e1ee43f5f543be417
MD5 120ef3ccab52a0916f9633ed57dab1c2
BLAKE2b-256 272b29bfe9e64943591fe55c53e44ca46a48217ff88d859c1edcb2150b29fd10

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page