Skip to main content

A set of Python libraries for querying and transforming data from AWS APIs

Reason this release was yanked:

Not ready

Project description

AWS Data Tools

An set of opinioned (but flexible) Python libraries for querying and transforming data from various AWS APIs, as well as a CLI interface.

This is in early development.

Data Types and Sources

The goal of this package is to provide consistent, enriched schemas for data from both raw API calls and data from logged events. We should also be able to unwrap and parse data from messaging and streaming services like SNS, Kinesis, and EventBridge.

Here are some examples:

  • Query Organizations APIs to build consistent, denormalized models of organizations
  • Validate and enrich data from CloudTrail log events
  • Parse S3 and ELB access logs into JSON

This initial release only contains support for managing data from AWS Organizations APIs.

The following table shows what kinds of things may be supported in the future:

Library Name Description Data Type Data Sources Supported
organizations Organization and OU hierarchy, policies, and accounts API Organizations APIs
cloudtrail Service API calls recorded by CloudTrail Log S3 / SNS / SQS / CloudWatch Logs / Kinesis / Kinesis Firehose
s3 Access logs for S3 buckets Log S3 / SNS / SQS
elb Access logs from Classic, Application, and Network Load Balancers Log S3 / SNS / SQS
vpc_flow Traffic logs from VPCs Log S3 / CloudWatch Logs / Kinesis / Kinesis Firehose
config Resource state change events from AWS Config Log S3 / SNS / SQS
firehose Audit logs for Firehose delivery streams Log CloudWatch Logs / Kinesis / Kinesis Firehose
ecs Container state change events Log CloudWatch Events / EventBridge
ecr Repository events for stored images Log CloudWatch Events / EventBridge

References:

Installing


NOTE: None of the following installation methods actually work. This is stubbed out to include possible future installation methods.


Using pip should work on any system with at least Python 3.9:

$ pip install aws-data-tools

MacOS

With homebrew:

$ brew install aws-data-tools-py

Using the pkg installer:

(This isn't how we'll want to do this. We want to bundle the application with all its dependencies, including Python itself. This probably means using pyInstaller to bundle an "app" image.)

$ LATEST=$(gh release list --repo timoguin/aws-data-tools-py | grep 'Latest' | cut -f1)
$ curl -sL https://github.com/segmentio/aws-okta/releases/download/aws-data-tools-py.pkg --output aws-data-tools-py_$LATEST.pkg
$ installer -pkg aws-data-tools.py_$LATEST.pkg -target /usr/local/bin

Windows

With chocolatey:

$ choco install aws-data-tools-py

Usage

Empty.

Testing

Organizations Data ETL

  • Bring up localstack instance (Pro) running IAM and Organizations (master account)
  • Seed instance with Organization data (OUs, accounts, policies)
  • Run script that performs ETL against data from the AWS Organizations APIs
  • Ensure generated data is the same as the seed data

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aws_data_tools-0.1.0a1.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

aws_data_tools-0.1.0a1-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file aws_data_tools-0.1.0a1.tar.gz.

File metadata

  • Download URL: aws_data_tools-0.1.0a1.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.6 CPython/3.9.2 Darwin/19.6.0

File hashes

Hashes for aws_data_tools-0.1.0a1.tar.gz
Algorithm Hash digest
SHA256 8d7cdcf6c62ec1b5671a2a7aab72636b17d6f000e15c51c1daa93763b1dd9f65
MD5 4958e33383f6a7bdb5f11d7cd8ae2d02
BLAKE2b-256 ec6f4c0cd07b3380ce88115423060f2ac3e8003e4475e7ca73b003ec80fe7b6a

See more details on using hashes here.

File details

Details for the file aws_data_tools-0.1.0a1-py3-none-any.whl.

File metadata

  • Download URL: aws_data_tools-0.1.0a1-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.6 CPython/3.9.2 Darwin/19.6.0

File hashes

Hashes for aws_data_tools-0.1.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 335947db8e596afeeb730726871f6f3e6e305285fd8e0ccbd9a1b50d3c2a7437
MD5 29db8a2fea465ed14ad443c5285ffd54
BLAKE2b-256 7e109a06ff6619f80f9b31e36b5ea8133a71084267cbb6d5729125987490b5f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page