Skip to main content

Tools for ACDC ETL pipeline

Project description

acdc-aws-etl-pipeline

Infrastructure and code for the ACDC ETL pipeline and data operations in AWS

Ingestion

DBT

Release Management

Deploying the dictionary

e.g. to testing

# Example 
bash services/dictionary/pull_dict.sh <raw_dictionary_url>
bash services/dictionary/upload_dictionary.py <local_dictionary_path> <s3_target_uri>

# implementation
VERSION=v0.6.1
bash services/dictionary/pull_dict.sh "https://raw.githubusercontent.com/AustralianBioCommons/acdc-schema-json/refs/tags/${VERSION}/dictionary/prod_dict/acdc_schema.json"
python3 services/dictionary/upload_dictionary.py "services/dictionary/schemas/acdc_schema_${VERSION}.json" s3://gen3schema-cad-uat-biocommons.org.au/cad.json

Generating synthetic metadata

  • Run this script to generate synthetic metadata for the studies in the dictionary
# this will generate 30 samples for AusDiab_Simulated and 60 samples for Baker-Biobank_Simulated
bash services/synthetic_data/generate_synth_metadata.sh --studies "AusDiab_Simulated,Baker-Biobank_Simulated" --permute-max-samples "30,60"

uploading synthetic metadata to sheepdog

  • Run this script to upload synthetic metadata to sheepdog
python3 services/synthetic_data/upload_synth_metadata_sheepdog.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

acdc_aws_etl_pipeline-0.3.1.tar.gz (27.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

acdc_aws_etl_pipeline-0.3.1-py3-none-any.whl (30.8 kB view details)

Uploaded Python 3

File details

Details for the file acdc_aws_etl_pipeline-0.3.1.tar.gz.

File metadata

  • Download URL: acdc_aws_etl_pipeline-0.3.1.tar.gz
  • Upload date:
  • Size: 27.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.9.24 Linux/6.11.0-1018-azure

File hashes

Hashes for acdc_aws_etl_pipeline-0.3.1.tar.gz
Algorithm Hash digest
SHA256 63f194ef6d3121f8451bebd0b382483939575d837c333bb684e582c9463ef7b5
MD5 060e433fcb542c9ce352a63111ba864a
BLAKE2b-256 cf3196bf36ad70a153f66477ccc43f2f8e71a4afc2167bcc61b079b46f303efd

See more details on using hashes here.

File details

Details for the file acdc_aws_etl_pipeline-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for acdc_aws_etl_pipeline-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 32ee3e9534a86d16f35e550cf8545cf8d67b17c2703929677d8f02a341571674
MD5 0adc3a773e8e38202144f24467f5fbc2
BLAKE2b-256 9bd163dd2d99a7f24938ba4804790f81cb8117a70702a1388aedbbed97788265

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page