Skip to main content

Infrastructure-as-code for ephemeral AWS ParallelCluster environments for bioinformatics

Project description

Daylily Ephemeral Cluster

Latest release Latest tag

DayEC is the operator control plane for short-lived AWS ParallelCluster environments that run Daylily analysis workloads on FSx for Lustre. The current data plane is DRA-first: the cluster starts with reference data mounted at /fsx/references, run folders are attached only when needed under /fsx/run_dir_mounts/<mount_id>, workflow outputs stay under /fsx/analysis_results/<executing_entity>/<analysis_id>, and completed analysis directories are exported through a temporary direct DRA to a chosen S3 analysis bucket.

The cluster is ephemeral. S3 buckets are durable. Verify the export receipt before deleting the cluster.

Supported Operator Contract

Use the checkout environment and the CLI, not historical helper-script paths:

  1. source ./activate
  2. dyec preflight
  3. dyec create
  4. dyec headnode connect
  5. dyec samples stage for sample-manifest inputs, or dyec mounts create for run-folder inputs
  6. dyec workflow launch
  7. dyec export --source-path /fsx/analysis_results/<executing_entity>/<analysis_id> --destination-s3-uri s3://bucket/prefix/<executing_entity>/<analysis_id>/
  8. inspect fsx_export.yaml
  9. dyec delete --dry-run
  10. dyec delete

daylily-ec and dyec are the same entrypoint. The shorter dyec form is used in examples.

One Copy-Pasteable Lifecycle

source ./activate

export AWS_PROFILE=daylily-service-lsmc
export REGION=us-west-2
export REGION_AZ=us-west-2d
export CLUSTER_NAME=day-demo-$(date +%Y%m%d%H%M%S)
export DAY_EX_CFG="$HOME/.config/daylily/daylily_ephemeral_cluster.yaml"
export REF_S3_URI=s3://lsmc-dayoa-references-usw2
export CONTROL_DATA_S3_URI=s3://lsmc-dayoa-control-data-usw2
export STAGE_S3_URI=s3://lsmc-ssf-sequencing-data/staged_external_data
export ANALYSIS_BUCKET=s3://lsmc-dayoa-analysis-results-us-west-2
export EXECUTING_ENTITY="${USER:-ubuntu}"
export ANALYSIS_ID=dayoa
export ANALYSIS_SAMPLES=etc/analysis_samples_template.tsv
export STAGE_CFG_DIR="$PWD/tmp-stage-config/$CLUSTER_NAME"
export EXPORT_DIR="$PWD/tmp-export/$ANALYSIS_ID"
export EXPORT_S3_URI="$ANALYSIS_BUCKET/analysis_results/$EXECUTING_ENTITY/$ANALYSIS_ID/"

dyec preflight \
  --profile "$AWS_PROFILE" \
  --region-az "$REGION_AZ" \
  --config "$DAY_EX_CFG"

dyec create \
  --profile "$AWS_PROFILE" \
  --region-az "$REGION_AZ" \
  --config "$DAY_EX_CFG"

dyec headnode connect \
  --profile "$AWS_PROFILE" \
  --region "$REGION" \
  --cluster "$CLUSTER_NAME"

dyec samples stage "$ANALYSIS_SAMPLES" \
  --profile "$AWS_PROFILE" \
  --region "$REGION" \
  --reference-s3-uri "$REF_S3_URI" \
  --control-data-s3-uri "$CONTROL_DATA_S3_URI" \
  --stage-s3-uri "$STAGE_S3_URI" \
  --config-dir "$STAGE_CFG_DIR"

dyec workflow launch \
  --profile "$AWS_PROFILE" \
  --region "$REGION" \
  --cluster "$CLUSTER_NAME" \
  --stage-dir "/fsx/staging/staged_external_sequencing_data/remote_stage_<timestamp>" \
  --analysis-id "$ANALYSIS_ID" \
  --executing-entity "$EXECUTING_ENTITY" \
  --git-tag 2.0.5 \
  --export-destination-s3-uri "$EXPORT_S3_URI" \
  --export-trigger on-success

# For run-folder work, attach only the S3 prefix you need.
dyec --json mounts create "s3://sequencer-run-bucket/runs/RUN123/" \
  --profile "$AWS_PROFILE" \
  --region "$REGION" \
  --cluster "$CLUSTER_NAME" \
  --platform ILMN \
  --read-only \
  --wait

dyec --json mounts verify \
  --profile "$AWS_PROFILE" \
  --region "$REGION" \
  --cluster "$CLUSTER_NAME" \
  --mount-id RUN123

dyec workflow launch \
  --profile "$AWS_PROFILE" \
  --region "$REGION" \
  --cluster "$CLUSTER_NAME" \
  --run-context-file ./runs.tsv \
  --analysis-id "<run-analysis-id>" \
  --executing-entity "$EXECUTING_ENTITY" \
  --git-tag 2.0.5 \
  --dy-command "bin/day_run produce_illumina_run_qc --config run_context_file=config/runs.tsv -p -j 5 -k"

dyec export \
  --profile "$AWS_PROFILE" \
  --region "$REGION" \
  --cluster "$CLUSTER_NAME" \
  --source-path "/fsx/analysis_results/$EXECUTING_ENTITY/$ANALYSIS_ID" \
  --destination-s3-uri "$EXPORT_S3_URI" \
  --output-dir "$EXPORT_DIR"

cat "$EXPORT_DIR/fsx_export.yaml"

dyec delete --dry-run \
  --profile "$AWS_PROFILE" \
  --region "$REGION" \
  --cluster "$CLUSTER_NAME"

dyec delete \
  --profile "$AWS_PROFILE" \
  --region "$REGION" \
  --cluster "$CLUSTER_NAME"

Architecture At A Glance

flowchart LR
  Ref["S3 reference bucket /data/"] -->|reference-data DRA| Data["/fsx/references"]
  Run["S3 run prefix"] -->|ephemeral run DRA| Mount["/fsx/run_dir_mounts/<mount_id>"]
  Data --> Workflow["DayOA workflow"]
  Mount --> Workflow
  Workflow --> Results["/fsx/analysis_results/..."]
  Results --> Export["temporary direct export DRA on /analysis_results/<executing_entity>/<analysis_id>/"]
  Export -->|EXPORT_TO_REPOSITORY| Analysis["S3 analysis bucket prefix /<executing_entity>/<analysis_id>/"]

Key rules:

  • /fsx/references is the reference-data DRA created with the cluster.
  • /fsx/run_dir_mounts/<mount_id> is for read-oriented run inputs and is not an export source.
  • /fsx/analysis_results/... is where workflow checkouts and outputs live.
  • dyec export creates a temporary DRA on the exact completed analysis directory, runs EXPORT_TO_REPOSITORY, and detaches it with DeleteDataInFileSystem=false.
  • fsx_export.yaml is the v3 export receipt to keep before teardown.

Pipeline Catalog

config/daylily_available_repositories.yaml is the source of truth for repositories and blessed launch profiles. The packaged copy under daylily_ec/resources/payload/config/ must match it.

The current DayOA pin is 2.0.5 for the repository default and every DayOA command. Catalog v2 separates:

  • sample_analysis: uses analysis_samples.tsv, stages inputs, and writes samples.tsv / units.tsv.
  • run_analysis: uses runs.tsv, requires a run DRA, and launches run-folder workflows such as Illumina run QC and BCL Convert.

What This Repo Ships

  • source ./activate: creates or repairs the DAY-EC environment and installs the checkout editable
  • dyec / daylily-ec: preflight, create, headnode, sample, workflow, mount, export, delete, state, repository, pricing, and AWS validation commands
  • DRA-backed ParallelCluster templates under config/day_cluster/
  • packaged resources under daylily_ec/resources/payload/
  • day-clone for headnode repository checkouts
  • tests that guard the catalog, packaged resources, SSM behavior, DRA mounts, export receipts, and environment contract

Read This Next

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

daylily_ephemeral_cluster-5.0.24.tar.gz (50.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

daylily_ephemeral_cluster-5.0.24-py3-none-any.whl (1.5 MB view details)

Uploaded Python 3

File details

Details for the file daylily_ephemeral_cluster-5.0.24.tar.gz.

File metadata

File hashes

Hashes for daylily_ephemeral_cluster-5.0.24.tar.gz
Algorithm Hash digest
SHA256 952573a1ad7546133a0399f8815aa4b730c21de9ad4297726a3ba4a3c11d17cd
MD5 54bccf42b01c31e6697f8f1b85fee896
BLAKE2b-256 6a0cb4fda635c309e9a92fff3a56a163257a613e602e5b5c7fae00fb5ac4739d

See more details on using hashes here.

File details

Details for the file daylily_ephemeral_cluster-5.0.24-py3-none-any.whl.

File metadata

File hashes

Hashes for daylily_ephemeral_cluster-5.0.24-py3-none-any.whl
Algorithm Hash digest
SHA256 deb9fb9e65d9fcdcb69d3276ecdbdf386d384d0be6382aadfa74586dd818a127
MD5 935985ab80357acfd02328d9b0b2dead
BLAKE2b-256 9d548f0f571a58635a6eb0a6434da5079e5add2774069a45c48aa5a24bdaf335

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page