Skip to main content

Unofficial Python SDK for Athena Federation

Project description

(Unofficial) Python SDK for Athena Federation

This is an unofficial Python SDK for Athena Federation.

Overview

The Python SDK makes it easy to create new Amazon Athena Data Source Connectors using Python. It is under active development so the API may change from version to version.

You can see an example implementation that queries Google Sheets using Athena.

gsheets_example

Current Limitations

  • Partitions are not supported, so Athena will not parallelize the query using partitions.
  • Splits are not supported (but coming soon), so Athena will only make 1 request to query your data.
  • Spill to S3 is not supported, so responses must be under 6MB.

Local Development

  • Ensure you've got the build module install and SDK dependencies.
pip install build
pip install -r requirements.txt
  • Now make a wheel.
python -m build

This will create a file in dist/: dist/unoffical_athena_federation_sdk-0.0.0-py3-none-any.whl

Copy that file to your example repo and you can include it in your requirements.txt like so:

unoffical-athena-federation-sdk @ file:///unoffical_athena_federation_sdk-0.0.0-py3-none-any.whl

Validating your connector

You can test your Lambda function locally using Lambda Docker images.

First, build our Docker image and run it.

docker build -t local/athena-python-example .
docker run --rm -p 9000:8080 local/athena-python-example

Then, we can execute a sample PingRequest.

curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{"@type": "PingRequest", "identity": {"id": "UNKNOWN", "principal": "UNKNOWN", "account": "123456789012", "arn": "arn:aws:iam::123456789012:root", "tags": {}, "groups": []}, "catalogName": "athena_python_sdk", "queryId": "1681559a-548b-4771-874c-2aa2ea7c39ab"}'
{"@type": "PingResponse", "catalogName": "athena_python_sdk", "queryId": "1681559a-548b-4771-874c-2aa2ea7c39ab", "sourceType": "athena_python_sdk", "capabilities": 23}

We can also list schemas.

curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{"@type": "ListSchemasRequest", "identity": {"id": "UNKNOWN", "principal": "UNKNOWN", "account": "123456789012", "arn": "arn:aws:iam::123456789012:root", "tags": {}, "groups": []}, "catalogName": "athena_python_sdk", "queryId": "1681559a-548b-4771-874c-2aa2ea7c39ab"}'
{"@type": "ListSchemasResponse", "catalogName": "athena_python_sdk", "schemas": ["sampledb"], "requestType": "LIST_SCHEMAS"}

Creating your Lambda function

💁 Please note these are manual instructions until a serverless application can be built.

  1. First, let's define some variables we need throughout.
export SPILL_BUCKET=<BUCKET_NAME>
export AWS_ACCOUNT_ID=123456789012
export AWS_REGION=us-east-1
export IMAGE_TAG=v0.0.1
  1. Create an S3 bucket that this Lambda function will use for Spill data
aws s3 mb ${SPILL_BUCKET}
  1. Create an ECR repository for this image
aws ecr create-repository --repository-name athena_example --image-scanning-configuration scanOnPush=true
  1. Push tag the image with the repo name and push it up
docker tag local/athena-python-example ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/athena_example:${IMAGE_TAG}
aws ecr get-login-password | docker login --username AWS --password-stdin ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com
docker push ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/athena_example:${IMAGE_TAG}
  1. Create an IAM role that will allow your Lambda function to execute

Note the Arn of the role that's returned

aws iam create-role \
    --role-name athena-example-execution-role \
    --assume-role-policy-document '{"Version": "2012-10-17","Statement": [{ "Effect": "Allow", "Principal": {"Service": "lambda.amazonaws.com"}, "Action": "sts:AssumeRole"}]}'
aws iam attach-role-policy \
    --role-name athena-example-execution-role \
    --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
  1. Grant the IAM role access to your S3 bucket
aws iam create-policy --policy-name athena-example-s3-access --policy-document '{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:ListBucket"],
      "Resource": ["arn:aws:s3:::'${SPILL_BUCKET}'"]
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:DeleteObject"
      ],
      "Resource": ["arn:aws:s3:::'${SPILL_BUCKET}'/*"]
    }
  ]
}'
aws iam attach-role-policy \
    --role-name athena-example-execution-role \
    --policy-arn arn:aws:iam::${AWS_ACCOUNT_ID}:policy/athena-example-s3-access
  1. Now create your function pointing to the created repository image
aws lambda create-function \
    --function-name athena-python-example \
    --role arn:aws:iam::${AWS_ACCOUNT_ID}:role/athena-example-execution-role \
    --code ImageUri=${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/athena_example:${IMAGE_TAG} \
    --environment 'Variables={TARGET_BUCKET=<BUCKET_NAME>}' \
    --description "Example Python implementation for Athena Federated Queries" \
    --timeout 60 \
    --package-type Image

Connect with Athena!

  1. Choose "Data sources" on the top navigation bar in the Athena console and then click "Connect data source"

  1. Choose the Lambda function you just created and click Connect!

Updating the Lambda function

If you update the Lambda function, re-run the build and push steps (updating the IMAGE_TAG variable) and then update the Lambda function:

aws lambda update-function-code \
    --function-name athena-python-example \
    --image-uri ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/athena_example:${IMAGE_TAG}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unoffical-athena-federation-sdk-0.0.3.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unoffical_athena_federation_sdk-0.0.3-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file unoffical-athena-federation-sdk-0.0.3.tar.gz.

File metadata

  • Download URL: unoffical-athena-federation-sdk-0.0.3.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.12

File hashes

Hashes for unoffical-athena-federation-sdk-0.0.3.tar.gz
Algorithm Hash digest
SHA256 1d149e5e33dc25539bacc260609b5c5f5e42f30360ef4a1a5bde825587e4f56f
MD5 b6c35c8b57f007e733b1553c03a7c3ba
BLAKE2b-256 03fd8943f3aca54238d82875717967c155702f4542137c5f1a781cf4de442152

See more details on using hashes here.

File details

Details for the file unoffical_athena_federation_sdk-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: unoffical_athena_federation_sdk-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.12

File hashes

Hashes for unoffical_athena_federation_sdk-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 70afa6a0c8c3661985de4c934931cbdde02bc98c9c2992bcfbc78d52da40117f
MD5 011e714e2a71a7b1afac1adca9ba4175
BLAKE2b-256 61c385de2f31290d779255fcfd821a3f7488ecca75eb9bdf5d7096341ddc12be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page