Infrastructure-as-code for ephemeral AWS ParallelCluster environments for bioinformatics
Project description
Daylily Ephemeral Cluster
Daylily Ephemeral Cluster, usually called DYEC or DayEC, is the Daylily control plane for short-lived AWS ParallelCluster environments. It renders cluster configuration, validates AWS prerequisites, creates FSx for Lustre storage, connects to headnodes through AWS Systems Manager, stages inputs, launches workflow repositories, exports completed analysis directories, and optionally registers exported evidence with Dewey for downstream QEO ingestion.
The cluster is disposable. The S3 inputs, reference bucket, analysis-export bucket, command catalog, and evidence receipts are durable. Do not delete a cluster until the export receipt and expected S3 outputs are verified.
Philosophy
DYEC is deliberately not a dogma-locked workflow manager. It provisions and exports the execution environment. The checked-out repository owns its workflow engine, command syntax, containers, profile, and final file layout below the analysis root. DayOA/Snakemake is the first-class Daylily workflow repository, and nf-core/Nextflow repositories such as daylily-sarek can also run on the same cluster when they honor the same FSx analysis-root and export contract.
The operating contract is strict. Missing config, credentials, references, run mounts, licenses, runtime assets, invalid sample identity, unsafe path segments, non-empty export destinations, and malformed command catalog rows should fail hard. DYEC should not guess a bucket, invent a credential, choose a replacement reference, or silently fall back to a legacy launch path.
Architecture
flowchart LR
Operator["operator or service<br/>dyec CLI"] --> Config["explicit config<br/>AWS profile, region, buckets"]
Config --> Pcluster["AWS ParallelCluster"]
Pcluster --> Headnode["headnode<br/>ubuntu via SSM"]
Pcluster --> FSx["FSx for Lustre"]
RefBucket["reference S3 bucket"] -->|reference DRA| References["/fsx/references"]
RunBucket["run S3 prefix"] -->|optional run DRA| RunMount["/fsx/run_dir_mounts/<mount_id>"]
Headnode --> Repo["workflow repository checkout"]
References --> Repo
RunMount --> Repo
Repo --> Results["/fsx/analysis_results/<entity>/<analysis_id>"]
Results -->|temporary export DRA| AnalysisBucket["analysis S3 bucket"]
AnalysisBucket --> Receipt["fsx_export.yaml"]
Receipt --> Dewey["Dewey registration"]
Dewey --> QEO["QEO ingestion"]
Filesystem Contract
| Path | Owner | Purpose |
|---|---|---|
/fsx/references |
DYEC cluster config | Reference and runtime assets mounted from the configured reference bucket. |
/fsx/control_data |
optional cluster config | Repeated-test or control assets when configured. |
/fsx/run_dir_mounts/<mount_id> |
dyec mounts |
Read-oriented S3 run-folder Data Repository Associations. |
/fsx/analysis_results/<executing_entity>/<analysis_id> |
workflow repository | Repository checkout, logs, work state, outputs, reports, and benchmarks. |
s3://<analysis-bucket>/<prefix>/<executing_entity>/<analysis_id>/ |
dyec export |
Durable export destination for one completed analysis directory. |
Run mounts and references are inputs. They are not export sources. The export source is exactly one completed analysis directory under /fsx/analysis_results/<executing_entity>/<analysis_id>.
Setup
Prerequisites:
- AWS credentials for a non-default profile with ParallelCluster, EC2, IAM, CloudFormation, S3, FSx, SSM, CloudWatch, and related read/write permissions.
- AWS region and availability zone selected for the cluster.
- AWS Session Manager plugin installed locally.
- AWS ParallelCluster CLI available through this repo environment.
- Configured S3 buckets for references, optional control data, staging, and analysis exports.
- A Daylily config file, normally
~/.config/daylily/daylily_ephemeral_cluster.yaml, with explicit bucket and cluster settings.
Activate the checkout and inspect the live CLI:
cd /path/to/daylily-ephemeral-cluster
source ./activate
dyec --json version
dyec --help
dyec runtime status
dyec --json repositories commands
Use placeholders in examples until your environment has real values:
export AWS_PROFILE=<non-default-profile>
export REGION=us-west-2
export REGION_AZ=us-west-2d
export CLUSTER_NAME=<cluster-name>
export DAY_EX_CFG="$HOME/.config/daylily/daylily_ephemeral_cluster.yaml"
export REF_S3_URI=s3://<reference-bucket>
export CONTROL_DATA_S3_URI=s3://<control-data-bucket>
export STAGE_S3_URI=s3://<staging-bucket>/<prefix>
export ANALYSIS_RESULTS_S3_URI=s3://<analysis-results-bucket>/<prefix>
export EXECUTING_ENTITY=ubuntu
export ANALYSIS_ID=<analysis-id>
export EXPORT_S3_URI="$ANALYSIS_RESULTS_S3_URI/$EXECUTING_ENTITY/$ANALYSIS_ID/"
Lifecycle
dyec preflight \
--profile "$AWS_PROFILE" \
--region-az "$REGION_AZ" \
--config "$DAY_EX_CFG"
dyec create \
--profile "$AWS_PROFILE" \
--region-az "$REGION_AZ" \
--config "$DAY_EX_CFG"
dyec headnode connect \
--profile "$AWS_PROFILE" \
--region "$REGION" \
--cluster "$CLUSTER_NAME"
After connection, the supported headnode user is ubuntu in an interactive bash login shell. Manual DayOA workflow work belongs in a persistent tmux session and uses separate commands:
source dyoainit
dy-a slurm hg38_broad
dy-r help -p -k -j 1 -n
For catalog-backed sample analysis, prefer dyec samples run:
dyec samples run ./analysis_samples.tsv \
--command-id illumina_snv_alignstats \
--profile "$AWS_PROFILE" \
--region "$REGION" \
--cluster "$CLUSTER_NAME" \
--reference-s3-uri "$REF_S3_URI" \
--control-data-s3-uri "$CONTROL_DATA_S3_URI" \
--stage-s3-uri "$STAGE_S3_URI" \
--analysis-id "$ANALYSIS_ID" \
--executing-entity "$EXECUTING_ENTITY" \
--export-destination-s3-uri "$EXPORT_S3_URI" \
--export-trigger on-success \
--dry-run
For run-folder analysis, attach a read-only run mount before launching a run-context command:
dyec --json mounts create "s3://<sequencing-run-bucket>/<run-prefix>/" \
--profile "$AWS_PROFILE" \
--region "$REGION" \
--cluster "$CLUSTER_NAME" \
--platform ILMN \
--read-only \
--wait \
--timeout-seconds 3600
dyec --json mounts verify \
--profile "$AWS_PROFILE" \
--region "$REGION" \
--cluster "$CLUSTER_NAME" \
--mount-id <mount_id>
Export exactly one completed analysis directory:
dyec export \
--profile "$AWS_PROFILE" \
--region "$REGION" \
--cluster "$CLUSTER_NAME" \
--source-path "/fsx/analysis_results/$EXECUTING_ENTITY/$ANALYSIS_ID" \
--destination-s3-uri "$EXPORT_S3_URI" \
--output-dir "./tmp-export/$ANALYSIS_ID"
Inspect fsx_export.yaml before cleanup. Delete is destructive; run dyec delete --dry-run first and perform live deletion only after the intended effect is approved and understood.
CLI Surface
Use dyec --help for the current root command list. Current major groups include:
preflight,create,delete,drift,cluster-infocluster,headnode,samples,workflowrepositories,mounts,mount,export,exportsslurm-accounting,aws,pricing,runtime,env,state,resources-dir
Important inspection commands:
dyec --json version
dyec --json cluster describe --profile "$AWS_PROFILE" --region "$REGION" --cluster "$CLUSTER_NAME"
dyec --json repositories commands
dyec repositories commands --command-id illumina_snv_alignstats
dyec workflow status --profile "$AWS_PROFILE" --region "$REGION" --cluster "$CLUSTER_NAME" --session <session>
dyec workflow logs --profile "$AWS_PROFILE" --region "$REGION" --cluster "$CLUSTER_NAME" --session <session> --lines 100
The Slurm accounting helper manages external accounting infrastructure when configured. A running cluster can have the sacct binary installed while accounting storage is disabled; in that state sacct cannot provide job accounting records even though the command exists.
Repository Catalog
config/daylily_pipeline_command_catalog.yaml is the source of truth for blessed repositories and commands. The packaged copy under daylily_ec/resources/payload/config/ must match it. The current catalog default for DayOA is 2.0.41; daylily-sarek is also present as a Nextflow/nf-core Sarek repository entry.
Catalog command classes:
utility: no sample or run inputs, usually used for smoke tests.sample_analysis: consumesanalysis_samples.tsv, stages sample/unit manifests, and launches a repository command.run_analysis: consumesruns.tsvand requires a matching/fsx/run_dir_mounts/<mount_id>input mount.
Reference Bucket Contract
The reference bucket is mounted to /fsx/references at cluster creation. It should contain:
- organism references and indexes for supported genome builds
- GIAB truth resources and high-confidence BEDs where concordance targets need them
- slim sample read fixtures used by catalog validation
- runtime assets that must be present before workflow activation, such as pinned tool installs, container caches, and licensed commercial tool assets
- tool-specific resource directories for annotation, STR, contamination, metagenomics, or other optional targets
DYEC does not choose alternate references at runtime. If a command catalog row points to a missing path, the launch should fail during staging, profile activation, or workflow execution with a clear missing-asset error.
Supporting Services
- Dewey: DYEC can register exported DayOA evidence after a successful export when the command catalog declares an explicit
artifact_registrationpolicy. - QEO: QEO loading is requested through Dewey/outbox events. DayOA emits local evidence; DYEC maps that evidence to exported S3 artifacts.
- Ursa: Ursa can own operator worksets and launch UX above DYEC. DYEC remains the cluster and export control plane.
- PCUI: PCUI-style interfaces should call the same catalog and CLI/API surfaces rather than duplicating launch policy.
- Slurm: Slurm is cluster infrastructure. Monitoring with
squeue,sacctwhen configured, logs, and DYEC status commands is allowed. Scheduler, node, job, drain/resume, requeue, cancel, or service interventions require explicit operator approval.
Contributing
When adding a runnable pipeline repository:
- add a repository row to the command catalog with a pinned
default_ref - add explicit command rows for supported launch profiles
- declare input contract, required columns, genome build, targets, jobs, and runtime parameters
- make the repository write all durable outputs below
/fsx/analysis_results/<executing_entity>/<analysis_id> - document export-relevant reports, logs, benchmarks, and manifests
- add tests for catalog rendering, command validation, and dry-run behavior where possible
- avoid compatibility aliases, inferred defaults, or fallback command paths
Historical plans and terminal working docs live under docs/jem_working_docs/. Active ledgers remain in docs/plans/.
Further Reading
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file daylily_ephemeral_cluster-5.1.30.tar.gz.
File metadata
- Download URL: daylily_ephemeral_cluster-5.1.30.tar.gz
- Upload date:
- Size: 53.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
444e32f8a05b27435d3b33f8adb8fb9d682c77a6b14468e92e869fdfc21225f4
|
|
| MD5 |
fe820bde06933e341956e99c53223d01
|
|
| BLAKE2b-256 |
b6091caf2be1814b4e7ed4de38c3a9f6c29fb3273bfd42496172258382cc3ba1
|
File details
Details for the file daylily_ephemeral_cluster-5.1.30-py3-none-any.whl.
File metadata
- Download URL: daylily_ephemeral_cluster-5.1.30-py3-none-any.whl
- Upload date:
- Size: 1.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6f736a2050e18ead8a2ef562a266b76e4bec54204a16c05f01d3ca3935988ef
|
|
| MD5 |
4ea7ee7a022901495de5783f87b676dd
|
|
| BLAKE2b-256 |
cbeab39d627239f7f6c12971fa0119f12c94b741aae8cf155922ffd84c382b7c
|