Skip to main content

CDK construct for creating an analysis environment using DuckDB for S3 data

Project description

CloudDuck Icon

CloudDuck is a simple and easy-to-use analysis environment for S3 data, featuring DuckDB with built-in authentication.

CloudDuck Display Image

Architecture

Architecture

Installation

npm i cloud-duck

Setup

Deploy

import { Size } from 'aws-cdk-lib';
import { CloudDuck } from 'cloud-duck';

declare const logBucket: s3.IBucket;

new CloudDuck(this, 'CloudDuck', {
  // The S3 bucket to analyze
  // CloudDuck can access to all of the buckets in the account by default.
  // If you want to restrict the access, you can use the targetBuckets property.
  targetBuckets: [logBucket],
  // The memory size of the Lambda function
  // Default: 1024 MB
  memory: Size.mebibytes(1024),
});

Add user to the Cognito User Pool

Add user to the Cognito User Pool to access the CloudDuck.

aws cognito-idp admin-create-user \
--user-pool-id "us-east-1_XXXXX" \
--username "naonao@example.com" \
--user-attributes Name=email,Value="naonao@example.com" Name=email_verified,Value=true \
--message-action SUPPRESS \
--temporary-password Password1!

Access

Access to the CloudDuck with the cloudfront URL.

 npx cdk deploy
...
AwsStack.CloudDuckDistributionUrl84FC8296 = https://dosjykpv096qr.cloudfront.net
Stack ARN:
arn:aws:cloudformation:us-east-1:123456789012:stack/AwsStack/dd0960c0-b3d5-11ef-bcfc-12cf7722116f

✨  Total time: 73.59s

Enter the username and password.

Login

When you log in at the first time, you need to change the password.

Change Password

Play with the CloudDuck!

CloudDuck

Usage

Query

You can query the S3 data with SQL.

SELECT * FROM read_csv_auto('s3://your-bucket-name/your-file.csv');
SELECT * FROM parquet_scan('s3://your-bucket-name/your-file.parquet');

Ofcourse, you can store the result as a new table.

CREATE TABLE new_table AS SELECT * FROM read_csv_auto('s3://your-bucket-name/your-file.csv');

Persistence

All query results are persisted in individual DuckDB files for each user. Therefore, you can freely save your query results without worrying about affecting other users.

Note

CloudDuck is still under development. Updates may include breaking changes. If you encounter any bugs, please report them via issues.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cloud_duck-0.0.3.tar.gz (52.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cloud_duck-0.0.3-py3-none-any.whl (52.3 MB view details)

Uploaded Python 3

File details

Details for the file cloud_duck-0.0.3.tar.gz.

File metadata

  • Download URL: cloud_duck-0.0.3.tar.gz
  • Upload date:
  • Size: 52.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for cloud_duck-0.0.3.tar.gz
Algorithm Hash digest
SHA256 2f460859598e255447ef60ab804857aec1fa14f6ecfc3ab9102b726b3623b240
MD5 103f68743dd7f431279ee5337192ad81
BLAKE2b-256 af2b21be7bd8960cd06a9542abad0b2666d61417ee31285b00e486b39669a714

See more details on using hashes here.

File details

Details for the file cloud_duck-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: cloud_duck-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 52.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for cloud_duck-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 26d243209c1f7c87e3cec325509e377044ff56830154261f56fff39845db43d8
MD5 b9ae261d4618ec06d7ae35fd94135b23
BLAKE2b-256 8b6dcb46920349c984cdc6289fd821ad9f522a97fc90bc3717a0f34df49e3b06

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page