Skip to main content

Infrastructure-as-code for ephemeral AWS ParallelCluster environments for bioinformatics

Project description

Daylily Ephemeral Cluster

Latest release Latest tag

DayEC is the operator control plane for short-lived AWS ParallelCluster environments that run Daylily analysis workloads on FSx for Lustre. The current data plane is DRA-first: the cluster starts with reference data mounted at /fsx/references, run folders are attached only when needed under /fsx/run_dir_mounts/<mount_id>, workflow outputs stay under /fsx/analysis_results/<executing_entity>/<analysis_id>, and completed analysis directories are exported through a temporary direct DRA to a chosen S3 analysis bucket.

The cluster is ephemeral. S3 buckets are durable. Verify the export receipt before deleting the cluster.

Supported Operator Contract

Use the checkout environment and the CLI, not historical helper-script paths:

  1. source ./activate
  2. dyec preflight
  3. dyec create
  4. dyec headnode connect
  5. dyec samples stage for sample-manifest inputs, or dyec mounts create for run-folder inputs
  6. dyec workflow launch
  7. dyec export --source-path /fsx/analysis_results/<executing_entity>/<analysis_id> --destination-s3-uri s3://bucket/prefix/<executing_entity>/<analysis_id>/
  8. inspect fsx_export.yaml
  9. dyec delete --dry-run
  10. dyec delete

daylily-ec and dyec are the same entrypoint. The shorter dyec form is used in examples.

One Copy-Pasteable Lifecycle

source ./activate

export AWS_PROFILE=daylily-service-lsmc
export REGION=us-west-2
export REGION_AZ=us-west-2d
export CLUSTER_NAME=day-demo-$(date +%Y%m%d%H%M%S)
export DAY_EX_CFG="$HOME/.config/daylily/daylily_ephemeral_cluster.yaml"
export REF_S3_URI=s3://lsmc-dayoa-references-usw2
export CONTROL_DATA_S3_URI=s3://lsmc-dayoa-control-data-usw2
export STAGE_S3_URI=s3://lsmc-ssf-sequencing-data/staged_external_data
export ANALYSIS_BUCKET=s3://lsmc-dayoa-analysis-results-us-west-2
export EXECUTING_ENTITY="${USER:-ubuntu}"
export ANALYSIS_ID=dayoa
export ANALYSIS_SAMPLES=etc/analysis_samples_template.tsv
export STAGE_CFG_DIR="$PWD/tmp-stage-config/$CLUSTER_NAME"
export EXPORT_DIR="$PWD/tmp-export/$ANALYSIS_ID"
export EXPORT_S3_URI="$ANALYSIS_BUCKET/analysis_results/$EXECUTING_ENTITY/$ANALYSIS_ID/"

dyec preflight \
  --profile "$AWS_PROFILE" \
  --region-az "$REGION_AZ" \
  --config "$DAY_EX_CFG"

dyec create \
  --profile "$AWS_PROFILE" \
  --region-az "$REGION_AZ" \
  --config "$DAY_EX_CFG"

dyec headnode connect \
  --profile "$AWS_PROFILE" \
  --region "$REGION" \
  --cluster "$CLUSTER_NAME"

dyec samples stage "$ANALYSIS_SAMPLES" \
  --profile "$AWS_PROFILE" \
  --region "$REGION" \
  --reference-s3-uri "$REF_S3_URI" \
  --control-data-s3-uri "$CONTROL_DATA_S3_URI" \
  --stage-s3-uri "$STAGE_S3_URI" \
  --config-dir "$STAGE_CFG_DIR"

dyec workflow launch \
  --profile "$AWS_PROFILE" \
  --region "$REGION" \
  --cluster "$CLUSTER_NAME" \
  --stage-dir "/fsx/staging/staged_external_sequencing_data/remote_stage_<timestamp>" \
  --analysis-id "$ANALYSIS_ID" \
  --executing-entity "$EXECUTING_ENTITY" \
  --git-tag 2.0.5 \
  --export-destination-s3-uri "$EXPORT_S3_URI" \
  --export-trigger on-success

# For run-folder work, attach only the S3 prefix you need.
dyec --json mounts create "s3://sequencer-run-bucket/runs/RUN123/" \
  --profile "$AWS_PROFILE" \
  --region "$REGION" \
  --cluster "$CLUSTER_NAME" \
  --platform ILMN \
  --read-only \
  --wait

dyec --json mounts verify \
  --profile "$AWS_PROFILE" \
  --region "$REGION" \
  --cluster "$CLUSTER_NAME" \
  --mount-id RUN123

dyec workflow launch \
  --profile "$AWS_PROFILE" \
  --region "$REGION" \
  --cluster "$CLUSTER_NAME" \
  --run-context-file ./runs.tsv \
  --analysis-id "<run-analysis-id>" \
  --executing-entity "$EXECUTING_ENTITY" \
  --git-tag 2.0.5 \
  --dy-command "bin/day_run produce_illumina_run_qc --config run_context_file=config/runs.tsv -p -j 5 -k"

dyec export \
  --profile "$AWS_PROFILE" \
  --region "$REGION" \
  --cluster "$CLUSTER_NAME" \
  --source-path "/fsx/analysis_results/$EXECUTING_ENTITY/$ANALYSIS_ID" \
  --destination-s3-uri "$EXPORT_S3_URI" \
  --output-dir "$EXPORT_DIR"

cat "$EXPORT_DIR/fsx_export.yaml"

dyec delete --dry-run \
  --profile "$AWS_PROFILE" \
  --region "$REGION" \
  --cluster "$CLUSTER_NAME"

dyec delete \
  --profile "$AWS_PROFILE" \
  --region "$REGION" \
  --cluster "$CLUSTER_NAME"

Architecture At A Glance

flowchart LR
  Ref["S3 reference bucket /data/"] -->|reference-data DRA| Data["/fsx/references"]
  Run["S3 run prefix"] -->|ephemeral run DRA| Mount["/fsx/run_dir_mounts/<mount_id>"]
  Data --> Workflow["DayOA workflow"]
  Mount --> Workflow
  Workflow --> Results["/fsx/analysis_results/..."]
  Results --> Export["temporary direct export DRA on /analysis_results/<executing_entity>/<analysis_id>/"]
  Export -->|EXPORT_TO_REPOSITORY| Analysis["S3 analysis bucket prefix /<executing_entity>/<analysis_id>/"]

Key rules:

  • /fsx/references is the reference-data DRA created with the cluster.
  • /fsx/run_dir_mounts/<mount_id> is for read-oriented run inputs and is not an export source.
  • /fsx/analysis_results/... is where workflow checkouts and outputs live.
  • dyec export creates a temporary DRA on the exact completed analysis directory, runs EXPORT_TO_REPOSITORY, and detaches it with DeleteDataInFileSystem=false.
  • fsx_export.yaml is the v3 export receipt to keep before teardown.

Pipeline Catalog

config/daylily_available_repositories.yaml is the source of truth for repositories and blessed launch profiles. The packaged copy under daylily_ec/resources/payload/config/ must match it.

The current DayOA pin is 2.0.5 for the repository default and every DayOA command. Catalog v2 separates:

  • sample_analysis: uses analysis_samples.tsv, stages inputs, and writes samples.tsv / units.tsv.
  • run_analysis: uses runs.tsv, requires a run DRA, and launches run-folder workflows such as Illumina run QC and BCL Convert.

What This Repo Ships

  • source ./activate: creates or repairs the DAY-EC environment and installs the checkout editable
  • dyec / daylily-ec: preflight, create, headnode, sample, workflow, mount, export, delete, state, repository, pricing, and AWS validation commands
  • DRA-backed ParallelCluster templates under config/day_cluster/
  • packaged resources under daylily_ec/resources/payload/
  • day-clone for headnode repository checkouts
  • tests that guard the catalog, packaged resources, SSM behavior, DRA mounts, export receipts, and environment contract

Read This Next

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

daylily_ephemeral_cluster-5.0.22.tar.gz (50.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

daylily_ephemeral_cluster-5.0.22-py3-none-any.whl (1.7 MB view details)

Uploaded Python 3

File details

Details for the file daylily_ephemeral_cluster-5.0.22.tar.gz.

File metadata

File hashes

Hashes for daylily_ephemeral_cluster-5.0.22.tar.gz
Algorithm Hash digest
SHA256 c7fe6f179460b145f83f3688eda6a130a72659b9d48ecbc79950e21c1c3081c2
MD5 2535a404a59cc386b91332fb67ee7d51
BLAKE2b-256 527ed16788b9e42193ea552ad2de4e8e1b2f46ada23cdf1576d44ab8fbf52ef0

See more details on using hashes here.

File details

Details for the file daylily_ephemeral_cluster-5.0.22-py3-none-any.whl.

File metadata

File hashes

Hashes for daylily_ephemeral_cluster-5.0.22-py3-none-any.whl
Algorithm Hash digest
SHA256 f2ff35f42823881b25e83dab8115bd3ac4b1522081ed26d03e1e57c4aab91ae2
MD5 28137254137d512aa78b45f5db601003
BLAKE2b-256 25ff9a5951eef76be4dca803afd544aaa4dbc28eb7caf45afbd984606613bbe2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page