Skip to main content

aws-analytics-reference-architecture

Project description

AWS Analytics Reference Architecture

The AWS Analytics Reference Architecture is a set of analytics solutions put together as end-to-end examples. It regroups AWS best practices for designing, implementing, and operating analytics platforms through different purpose-built patterns, handling common requirements, and solving customers' challenges.

This project is composed of:

  • Reusable core components exposed in an AWS CDK (Cloud Development Kit) library currently available in Typescript and Python. This library contains AWS CDK constructs that can be used to quickly provision analytics solutions in demos, prototypes, proof of concepts and end-to-end reference architectures.
  • Reference architectures consumming the reusable components to demonstrate end-to-end examples in a business context. Currently, the AWS native reference architecture is available.

This documentation explains how to get started with the core components of the AWS Analytics Reference Architecture.

Getting started

Prerequisites

  1. Create an AWS account

  2. The core components can be deployed in any AWS region

  3. Install the following components with the specified version on the machine from which the deployment will be executed:

    1. Python [3.8-3.9.2] or Typescript
    2. AWS CDK v2: Please refer to the Getting started guide.
  4. Bootstrap AWS CDK in your region (here eu-west-1). It will provision resources required to deploy AWS CDK applications

export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export AWS_REGION=eu-west-1
cdk bootstrap aws://$ACCOUNT_ID/$AWS_REGION

Initialization (in Python)

  1. Initialize a new AWS CDK application in Python and use a virtual environment to install dependencies
mkdir my_demo
cd my_demo
cdk init app --language python
python3 -m venv .env
source .env/bin/activate
  1. Add the AWS Analytics Reference Architecture library in the dependencies of your project. Update requirements.txt
aws-cdk-lib==2.51.0
constructs>=10.0.0,<11.0.0
aws_analytics_reference_architecture>=2.0.0
  1. Install The Packages via pip
python -m pip install -r requirements.txt

Development

  1. Import the AWS Analytics Reference Architecture in your code in my_demo/my_demo_stack.py
import aws_analytics_reference_architecture as ara
  1. Now you can use all the constructs available from the core components library to quickly provision resources in your AWS CDK stack. For example:
  • The DataLakeStorage to provision a full set of pre-configured Amazon S3 Bucket for a data lake
        # Create a new DataLakeStorage with Raw, Clean and Transform buckets configured with data lake best practices
        storage = ara.DataLakeStorage (self,"storage")
  • The DataLakeCatalog to provision a full set of AWS Glue databases for registring tables in your data lake
        # Create a new DataLakeCatalog with Raw, Clean and Transform databases
        catalog = ara.DataLakeCatalog (self,"catalog")
  • The DataGenerator to generate live data in the data lake from a pre-configured retail dataset
        # Generate the Sales Data
        sales_data = ara.BatchReplayer(
            scope=self,
            id="sale-data",
            dataset=ara.PreparedDataset.RETAIL_1_GB_STORE_SALE,
            sink_object_key="sale",
            sink_bucket=storage.raw_bucket,
         )
        # Generate the Customer Data
        customer_data = ara.BatchReplayer(
            scope=self,
            id="customer-data",
            dataset=ara.PreparedDataset.RETAIL_1_GB_CUSTOMER,
            sink_object_key="customer",
            sink_bucket=storage.raw_bucket,
         )
  • Additionally, the library provides some helpers to quickly run demos:
        # Configure defaults for Athena console
        athena_defaults = ara.AthenaDemoSetup(scope=self, id="demo_setup")
        # Configure a default role for AWS Glue jobs
        ara.GlueDemoRole.get_or_create(self)

Deployment

Deploy the AWS CDK application

cdk deploy

The time to deploy the application is depending on the constructs you are using

Cleanup

Delete the AWS CDK application

cdk destroy

API Reference

More contructs, helpers and datasets are available in the AWS Analytics Reference Architecture. See the full API specification here

Contributing

Please refer to the contributing guidelines and contributing FAQ for details.

License Summary

The documentation is made available under the Creative Commons Attribution-ShareAlike 4.0 International License. See the LICENSE file.

The sample code within this documentation is made available under the MIT-0 license. See the LICENSE-SAMPLECODE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file aws_analytics_reference_architecture-2.12.13.tar.gz.

File metadata

File hashes

Hashes for aws_analytics_reference_architecture-2.12.13.tar.gz
Algorithm Hash digest
SHA256 9d08c44130ea2733b132a0f767d67202dfff3f7cb4df08152979c2bd4808ad64
MD5 c28bc83149eab33ebc855f72c2ddecad
BLAKE2b-256 42ef683ea4decc4f4c7956de9b0e92879519e18346b8eb69ac1a6d9de4d438a4

See more details on using hashes here.

File details

Details for the file aws_analytics_reference_architecture-2.12.13-py3-none-any.whl.

File metadata

File hashes

Hashes for aws_analytics_reference_architecture-2.12.13-py3-none-any.whl
Algorithm Hash digest
SHA256 613b091e6cae3c7a767fbb0f7db76e7dd92719d8b2ed1281d7dcb8e44d49d7df
MD5 4a1994c9f9612becabe11d797306e661
BLAKE2b-256 80280ddfae2dc0b9f95a8ea3a735c2e43babc1a29b750625e9889ec9dd845e39

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page