Skip to main content

CDK construct for creating an analysis environment using DuckDB for S3 data

Project description

CloudDuck Icon

CloudDuck is a CDK construct for simple and easy-to-use analysis environment for S3 data, featuring DuckDB with built-in authentication.

By simply deploying the Construct, you can launch a SaaS that provides an analytics dashboard like the one shown below. User authentication for access is implemented using Cognito, ensuring that only authorized users can log in.

CloudDuck Display Image

View on Construct Hub Open in Visual Studio Code npm version Build Status Release Status License Downloads npm downloads

Detailed information

Table of Contents

Use Cases

  • When you want to request data analysis on S3 using DuckDB but prefer not to issue S3 access credentials to the analysts.
  • When you want to minimize the costs incurred from downloading large amounts of S3 data to local storage.

Architecture

Architecture

Installation

npm i cloud-duck

Setup

Deploy

You can deploy the CloudDuck with the following code in the CDK stack.

import { CloudDuck } from 'cloud-duck';
import { Size } from 'aws-cdk-lib';
import * as cognito from 'aws-cdk-lib/aws-cognito';

declare const logBucket: s3.IBucket;

new CloudDuck(this, 'CloudDuck', {
  // The S3 bucket to analyze
  // CloudDuck can access to all of the buckets in the account by default.
  // If you want to restrict the access, you can use the targetBuckets property.
  targetBuckets: [logBucket],
  // The memory size of the Lambda function
  // Default: 1024 MB
  memory: Size.mebibytes(1024),
  // You can customize the Cognito User Pool
  // For example, you can force the user to use MFA.
  userPoolPlpos: {
    mfa: cognito.Mfa.REQUIRED,
    mfaSecondFactor: {
      sms: false,
      otp: true,
    },
  },
});

Add user to the Cognito User Pool

Add a user to the Cognito User Pool with the following command.

aws cognito-idp admin-create-user \
--user-pool-id "us-east-1_XXXXX" \
--username "naonao@example.com" \
--user-attributes Name=email,Value="naonao@example.com" Name=email_verified,Value=true \
--message-action SUPPRESS \
--temporary-password Password1!

You can also add a user via the AWS Management Console.

Access

Access to the CloudDuck with the cloudfront URL.

 npx cdk deploy
...
AwsStack.CloudDuckDistributionUrl84FC8296 = https://dosjykpv096qr.cloudfront.net
Stack ARN:
arn:aws:cloudformation:us-east-1:123456789012:stack/AwsStack/dd0960c0-b3d5-11ef-bcfc-12cf7722116f

✨  Total time: 73.59s

Enter the username and password.

Login

When you log in at the first time, you need to change the password.

Change Password

Play with the CloudDuck!

CloudDuck

Usage

Query

You can query the S3 data with SQL.

SELECT * FROM read_csv_auto('s3://your-bucket-name/your-file.csv');
SELECT * FROM parquet_scan('s3://your-bucket-name/your-file.parquet');

Ofcourse, you can store the result as a new table.

CREATE TABLE new_table AS SELECT * FROM read_csv_auto('s3://your-bucket-name/your-file.csv');

Detail usage of DuckDB is available at DuckDB Documentation.

Persistence

All query results are persisted in individual DuckDB files for each user. Therefore, you can freely save your query results without worrying about affecting other users.

Note

CloudDuck is still under development. Updates may include breaking changes. If you encounter any bugs, please report them via issues.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cloud_duck-0.0.25.tar.gz (54.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cloud_duck-0.0.25-py3-none-any.whl (54.0 MB view details)

Uploaded Python 3

File details

Details for the file cloud_duck-0.0.25.tar.gz.

File metadata

  • Download URL: cloud_duck-0.0.25.tar.gz
  • Upload date:
  • Size: 54.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.14.3

File hashes

Hashes for cloud_duck-0.0.25.tar.gz
Algorithm Hash digest
SHA256 866eb6ac7b07a78a4e82bf8fd69cfaf05478dc5288c62b127786b21ad96e09c5
MD5 f25f8faf3f25af96fadc2b2ed36ddd1d
BLAKE2b-256 0680ba74b8cfb481704ee1233559eb5a9b950570634ec97a5f544de21baa39bd

See more details on using hashes here.

File details

Details for the file cloud_duck-0.0.25-py3-none-any.whl.

File metadata

  • Download URL: cloud_duck-0.0.25-py3-none-any.whl
  • Upload date:
  • Size: 54.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.14.3

File hashes

Hashes for cloud_duck-0.0.25-py3-none-any.whl
Algorithm Hash digest
SHA256 c8901ceaad14f08111e3a193363d07eac1324db7c94873005067d91ef9f3a79b
MD5 481451b223c03a50a85fe674d4d7e5e4
BLAKE2b-256 75b9144abaf6e52c62a7d085f09dccf3ae0c45d6372a6ef860d61a07d811c70f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page