Infrastructure-as-code for ephemeral AWS ParallelCluster environments for bioinformatics
Project description
Daylily Ephemeral Cluster
Daylily provisions ephemeral AWS ParallelCluster environments for bioinformatics workloads. It gives operators a Python control plane, shared reference-data plumbing, head-node bootstrap, workflow launch helpers, and a clean delete path.
Highlights
Single Command Cluster Creation
once your local environment, AWS profile, and cluster config are in place
daylily-ec create --region-az "$REGION_AZ" --profile "$AWS_PROFILE" --config "$DAY_EX_CFG"
Architecture & Features
- Repeatable cluster bring-up with preflight checks, rendered cluster YAML, and automated head-node bootstrap.
- Cost-aware infrastructure with pricing snapshots, budgets, heartbeat notifications, and a clear export/delete lifecycle.
- Shared FSx for Lustre storage backed by a region-scoped S3 bucket for references, staged inputs, and results.
- Pluggable workflow catalog controlled by YAML and cloned via
day-clone. - Laptop-to-cluster staging and launch helpers for a practical remote operator flow.
- Operator artifacts including preflight reports, state snapshots, rendered YAML, and reusable config in
~/.config/daylily/.
Architecture at a Glance
Daylily composes a few AWS building blocks into a usable HPC environment:
daylily_ecis the control plane that validates prerequisites, renders cluster YAML, applies live spot pricing, creates the cluster, and records state snapshots.- AWS ParallelCluster + Slurm provide the compute fabric.
- FSx for Lustre is mounted at
/fsxand linked to a region-specific S3 bucket whose name includesomics-analysis. - Auto-scaling EC2 workers let the compute fleet expand and contract around real workload demand.
- VPC, subnets, and security groups are resolved or created as part of the operator flow.
- Head-node Daylily utilities handle staging, cloning, validation, and launch workflows.
- Workflow registry metadata in
config/daylily_available_repositories.yamldefines approved repos and default refs. - Artifacts under
~/.config/daylily/preserve the exact config, state, and rendered templates that created a cluster.
Workflow Catalog & Launch Path
The head node ships with a registry of vetted repositories defined in config/daylily_available_repositories.yaml. Operators can clone approved pipelines with day-clone, override refs for development work, and launch through the bundled staging + tmux-based helpers.
Current bundled examples include daylily-omics-analysis, rna-seq-star-deseq2, and daylily-sarek. Because the compute layer is plain Slurm, any orchestrator that speaks Slurm can run once the repo is on the head node.
Reference Data & Canned Controls
High-throughput analyses rely on predictable reference data access. Daylily expects a region-scoped reference bucket and uses FSx for Lustre so the whole cluster sees the same durable inputs.
-
verify your reference bucket exists
daylily-omics-references --profile "$AWS_PROFILE" --region "$REGION" \ verify --bucket "${BUCKET_PREFIX}-daylily-omics-analysis-${REGION}" --exclude-b37
-
clone reference bundles into a region-scoped S3 bucket using the installed
daylily-omics-referencesCLI:export AWS_PROFILE=daylily-service export REGION=us-west-2 export BUCKET_PREFIX=myorg REF_VERSION_FILE="$(find config/day_cluster -maxdepth 1 -name 'daylily_reference_version_*.info' | head -n 1)" REF_VERSION="$(basename "$REF_VERSION_FILE" .info)" REF_VERSION="${REF_VERSION#daylily_reference_version_}" daylily-omics-references --profile "$AWS_PROFILE" --region "$REGION" \ clone --bucket-prefix "$BUCKET_PREFIX" --version "$REF_VERSION" --execute
-
automatic mounting via FSx means all compute nodes see shared directories such as
/fsx/data,/fsx/resources, and/fsx/analysis_results. -
staging and controls are handled by the bundled staging helpers and the workflow repos cloned onto the head node.
The reference bucket is durable. The cluster is not. That is the point.
Remote Data Staging & Pipeline Execution
Use the bundled helpers to stage on the head node, stage from a laptop through the FSx-backed S3 path, and launch workflows remotely in tmux. The full operator flow is documented later in this README and in docs/operations.md.
Cost Monitoring & Budget Enforcement
Daylily uses AWS Budgets, cost-allocation tags, pricing helpers, and heartbeat notifications to give operators per-cluster and per-project cost visibility:
- preflight before mutation so obvious IAM/quota/bucket mistakes fail early
- pricing snapshots through
daylily-ec pricing snapshot - budget-aware lifecycle hooks in the cluster config flow
- heartbeat notifications so stale FSx/EBS resources are harder to forget
- tagged-resource reporting for finding orphaned spend and drift
For ephemeral infrastructure, cost is part of operations, not an appendix.
Installation -- Quickest Start
only useful if you already have AWS account configuration, a target region, and a reference bucket in place
The shortest supported operator path from a fresh checkout is:
./bin/check_prereq_sw.sh
./bin/install_miniconda # only if conda is not already installed
./bin/init_dayec
source ./activate
export AWS_PROFILE=daylily-service
export REGION=us-west-2
export REGION_AZ=us-west-2c
mkdir -p ~/.config/daylily
cp config/daylily_ephemeral_cluster_template.yaml \
~/.config/daylily/daylily_ephemeral_cluster.yaml
export DAY_EX_CFG="$HOME/.config/daylily/daylily_ephemeral_cluster.yaml"
daylily-ec preflight --region-az "$REGION_AZ" --profile "$AWS_PROFILE" --config "$DAY_EX_CFG"
daylily-ec create --region-az "$REGION_AZ" --profile "$AWS_PROFILE" --config "$DAY_EX_CFG"
See docs/quickest_start.md for the shortest canonical runbook and docs/operations.md for day-2 operations.
Installation -- Detailed
AWS
Operator Identity
Create or reuse an AWS operator identity, typically daylily-service, and make sure it can:
- create and inspect ParallelCluster resources in the target account
- manage the reference bucket used for FSx-backed data access
- create or inspect network resources needed for the cluster lifecycle
The repo ships the cluster policy template at config/aws/daylily-service-cluster-policy.json.
Create Service Linked Role VERY IMPORTANT
If this role is missing, spot capacity can fail in annoying ways even when the head node looks fine.
aws iam list-roles --query "Roles[?RoleName=='AWSServiceRoleForEC2Spot'].RoleName"
aws iam create-service-linked-role --aws-service-name spot.amazonaws.com
Quotas, Cost Tags, And Other AWS Considerations
Daylily preflight checks the common blockers, but you should still be aware of them up front:
- EC2 on-demand quota for the head node
- EC2 spot quota for the target fleet shape
- FSx for Lustre quota in the target region
- VPC/subnet/networking limits if operating across many clusters or regions
- budget and cost-allocation-tag permissions if you want cost reporting and enforcement
Historically, under-provisioned spot quotas and missing FSx quota were the easiest ways to waste time before the first successful cluster build.
AWS CLI Profile
Minimal profile example:
[daylily-service]
region = us-west-2
output = json
Be explicit with
AWS_PROFILE; avoid leaning ondefault.
SSH Key Pair(s)
- Keep the PEM in
~/.ssh/ - use the same region as your target cluster
- naming it with
-omics-remains the easiest convention to recognize later
mkdir -p ~/.ssh
chmod 700 ~/.ssh
chmod 400 ~/.ssh/<your-key>.pem
Prerequisites (On Your Local Machine)
From the repo root:
./bin/check_prereq_sw.sh
./bin/install_miniconda # only if needed
./bin/init_dayec
source ./activate
daylily-ec info
daylily-ec version
./bin/init_dayec creates or updates the DAY-EC conda environment and installs this repo into it. source ./activate activates DAY-EC when present, adds bin/ to PATH, and exposes daylily-ec in the current shell.
Clone Reference Bucket (only needs to be done once per region, or whenever it is missing)
daylily-ec preflight and daylily-ec create will fail if the expected reference bucket is not detected in the region you run in.
export AWS_PROFILE=daylily-service
export REGION=us-west-2
export BUCKET_PREFIX=myorg
REF_VERSION_FILE="$(find config/day_cluster -maxdepth 1 -name 'daylily_reference_version_*.info' | head -n 1)"
REF_VERSION="$(basename "$REF_VERSION_FILE" .info)"
REF_VERSION="${REF_VERSION#daylily_reference_version_}"
daylily-omics-references --profile "$AWS_PROFILE" --region "$REGION" \
clone --bucket-prefix "$BUCKET_PREFIX" --version "$REF_VERSION" --execute
Optional manual verification:
daylily-omics-references --profile "$AWS_PROFILE" --region "$REGION" \
verify --bucket "${BUCKET_PREFIX}-daylily-omics-analysis-${REGION}" --exclude-b37
Prepare The Cluster Config
Configuration for the create flow lives in a user-managed YAML file. Copy the template to a writable location and set DAY_EX_CFG for convenience:
mkdir -p ~/.config/daylily
cp config/daylily_ephemeral_cluster_template.yaml \
~/.config/daylily/daylily_ephemeral_cluster.yaml
export DAY_EX_CFG="$HOME/.config/daylily/daylily_ephemeral_cluster.yaml"
Recommended keys to fill in before the first run:
cluster_names3_bucket_namebudget_emailallowed_budget_usersglobal_allowed_budget_usersheartbeat_email
Optional if you want fewer prompts:
ssh_key_namepublic_subnet_idprivate_subnet_idiam_policy_arn
Leave any value set to PROMPTUSER if you want the CLI to query AWS and prompt interactively.
DAY_EX_CFGis a shell convenience variable. The current Python CLI does not consume it implicitly, so always pass--config "$DAY_EX_CFG".
Generate Analysis Cost Estimates per Availability Zone
If you want a raw pricing snapshot before choosing an AZ:
daylily-ec pricing snapshot \
--region "$REGION" \
--config config/day_cluster/prod_cluster.yaml \
--profile "$AWS_PROFILE"
This is the current supported path for spot-pricing inspection.
Create An Ephemeral Cluster
from your local machine, in the Daylily repo root
export REGION_AZ=us-west-2c
daylily-ec preflight \
--region-az "$REGION_AZ" \
--profile "$AWS_PROFILE" \
--config "$DAY_EX_CFG"
daylily-ec create \
--region-az "$REGION_AZ" \
--profile "$AWS_PROFILE" \
--config "$DAY_EX_CFG"
Add --pass-on-warn to preflight only when you have reviewed the warnings and intend to continue.
What the flow does:
- your AWS credentials are used to query required resources
- the CLI validates the selected reference bucket before provisioning
- baseline network resources are resolved or created as needed
- cluster YAML is rendered and spot pricing is applied
- the cluster is created through ParallelCluster
- the head node is bootstrapped with
DAY-EC,day-clone, and bundled helpers - preflight reports, state snapshots, and rendered artifacts are written to
~/.config/daylily/
Cluster creation is not instant. Expect a real wait while ParallelCluster, FSx, and head-node bootstrap finish.
If you need to debug a failure, the CloudFormation console still gives the best low-level event trail.
What Success Looks Like
After a successful run:
~/.config/daylily/contains the preflight report, state snapshot, and rendered cluster YAML- the head node has
DAY-EC,day-clone, and the Daylily helper scripts ready to use
Costs
Monitoring, Tags, And Budgets
Daylily-created resources are intended to be tagged well enough for operators to monitor costs, find stale infrastructure, and reason about run economics per cluster or per project.
Budget-aware lifecycle hooks still exist conceptually in the Daylily flow, but treat hard enforcement as something to test in your own account before trusting it.
Typical Cost Drivers
These are still the right cost buckets to watch:
- head node: steady on-demand cost while the cluster exists
- FSx for Lustre: the biggest idle-cluster cost driver in many runs
- spot fleet: burst cost during real workload execution
- reference bucket: durable monthly storage cost
- retained EBS / retained FSx: easy-to-forget stale-resource cost
Historically, the most expensive mistake was not running the cluster — it was forgetting what you chose to retain after deleting it.
Operator Advice
- run
daylily-ec pricing snapshotbefore choosing an AZ if price sensitivity matters - keep an eye on retained FSx and root volumes
- export results promptly and delete the cluster promptly
- do not treat the ephemeral cluster as long-term storage
Working With The Ephemeral Clusters
Connect To A Cluster
List clusters in a region:
daylily-ec cluster-info --region "$REGION" --profile "$AWS_PROFILE"
SSH to the head node:
ssh -i ~/.ssh/<your-key>.pem ubuntu@<headnode-ip>
Use daylily-ec cluster-info or pcluster describe-cluster to find the head-node public IP.
Validate The Head Node
Once connected to the head node:
cd ~/projects/daylily-ephemeral-cluster
conda activate DAY-EC
daylily-ec info
day-clone --list
ls -lth /fsx/
You want to see the Daylily CLI available, the workflow registry available, and the expected shared filesystem directories present.
Stage Sample Data & Build config/samples.tsv and config/units.tsv
Run Directly On The Head Node
cd ~/projects/daylily-ephemeral-cluster
./bin/daylily-stage-analysis-samples-headnode /path/to/analysis_samples.tsv
# optionally override the stage target
./bin/daylily-stage-analysis-samples-headnode \
/path/to/analysis_samples.tsv \
/fsx/custom_stage_dir
Launch Staging From Your Laptop
./bin/daylily-stage-samples-from-local-to-headnode \
--profile "$AWS_PROFILE" \
--region "$REGION" \
--reference-bucket "s3://myorg-daylily-omics-analysis-${REGION}" \
--config-dir ./generated-config \
./analysis_samples.tsv
Important details:
- this flow stages through the S3-backed FSx repository, not by copying large files over SSH
- its default staging base is
/data/staged_sample_data, which appears on the cluster as/fsx/data/staged_sample_data/remote_stage_<timestamp>/ - local
samples.tsvandunits.tsvcopies are written into--config-diror next to the source TSV if omitted
Clone & Launch the Workflow From Your Laptop
./bin/daylily-run-omics-analysis-headnode \
--profile "$AWS_PROFILE" \
--region "$REGION" \
--cluster "$CLUSTER_NAME" \
--pem ~/.ssh/<your-key>.pem \
--stage-base /fsx/data/staged_sample_data
Useful launch flags include --stage-dir, --repository, --project, --target, --jobs, and --dry-run.
The launcher clones the workflow via day-clone, copies staged config files into the repo, and starts the run inside a tmux session on the head node.
Slurm Monitoring
On the head node:
sinfo
squeue -o "%.18i %.8u %.8T %.10M %.30N %.50j"
tmux ls
tmux attach -t <session-name>
Use watch squeue or your preferred terminal workflow if you want a live scheduler view.
Export fsx Analysis Results Back To S3
Use daylily-ec export after source ./activate.
daylily-ec export \
--cluster-name "$CLUSTER_NAME" \
--region "$REGION" \
--target-uri analysis_results \
--output-dir .
This starts an FSx data-repository export task, waits for completion, and writes ./fsx_export.yaml with the result. Use analysis_results or a subdirectory under it for --target-uri; if you already know the exact S3 destination under the filesystem export root, you may pass that s3://... URI instead.
The legacy helpers bin/daylily-export-fsx-to-s3-from-local and bin/daylily-export-fsx-to-s3 still exist, but they now delegate to the same daylily-ec export workflow.
Be sure you export results from
/fsx/analysis_resultsbefore deleting the cluster. FSx is scratch-like high-performance working storage, not your final archive.
Delete The Cluster
When the workload is complete and results have been exported, use the Daylily control plane to tear the cluster down:
daylily-ec delete --cluster-name "$CLUSTER_NAME" --region "$REGION"
If you have the state file from the create run, you can let Daylily resolve the cluster metadata and heartbeat resources automatically:
daylily-ec delete --state-file "$STATE_FILE"
The command preserves the existing please delete confirmation when FSx filesystems are still attached unless you explicitly pass --yes. The legacy helper bin/daylily-delete-ephemeral-cluster remains available as a compatibility wrapper around the same CLI flow.
If you retained FSx or root volumes by choice, go confirm their fate explicitly. Those are the resources most likely to keep costing money after you think you are done.
PCUI (technically optional, but you will be missing out)
PCUI is not required for the supported Daylily operator flow, but it is still useful if you want a browser-driven interface for cluster visibility and interactive shell access.
- use the AWS ParallelCluster PCUI docs for installation
- use the VPC and subnet associated with a working Daylily cluster in the region
- enable SSM if you want browser-shell access; from there you will usually want
sudo su - ubuntu
The CLI/SSH path remains the canonical Daylily workflow, but PCUI is still a legitimate companion tool.
Other Monitoring Tools
AWS CloudWatch
CloudWatch remains useful for low-level cluster and node inspection, especially when debugging failures or inspecting logs outside the terminal that launched the cluster.
Drift And Tagged Resources
Useful operator commands:
daylily-ec drift --state-file ~/.config/daylily/state_<cluster>_<timestamp>.json --profile "$AWS_PROFILE"
daylily-ec cluster-info --region "$REGION" --profile "$AWS_PROFILE"
SNS Notifications (cluster heartbeat - experimental)
Heartbeat notifications are intended to keep noisy, pricey stale resources from quietly hanging around forever.
Documentation
- docs/quickest_start.md: the shortest canonical install and cluster creation runbook
- docs/operations.md: validation, staging, launch, monitoring, export, and delete
- docs/overview.md: architecture, cost context, benchmark links, and system model
- docs/pip_install.md: pip-based usage and packaged resources
- docs/DAY_EC_ENVIRONMENT.md: local environment and CLI diagnostics
- CONTRIBUTING.md: development and docs contribution guide
Historical Material
Older long-form docs and retired notes live under docs/archive/. They remain useful for background and screenshots, but the supported workflows are described by this README and the live docs above.
Known Issues
The old README was correct about one thing: known issues belong in operator docs.
- very large or poorly prepared reference buckets can still make FSx-backed cluster creation painful
- low spot or FSx quotas are still among the easiest ways to fail before a first successful build
- CloudFormation remains the best place to inspect low-level failure events during provisioning
Contributing
Versioning
Daylily uses tagged releases. For currently available versions, see the tags on this repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file daylily_ephemeral_cluster-0.7.614.tar.gz.
File metadata
- Download URL: daylily_ephemeral_cluster-0.7.614.tar.gz
- Upload date:
- Size: 46.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8f0d328d877a3897ff66069a0aad3d6f4b7aa609d18457cd68f908f89892d65
|
|
| MD5 |
57a1769d7a04ce3a4d3b10eb56cbe428
|
|
| BLAKE2b-256 |
38ee485b5e16155061ae5fcd1c5862d32ac97622c26ed58d240b7f8bac31a3c6
|
File details
Details for the file daylily_ephemeral_cluster-0.7.614-py3-none-any.whl.
File metadata
- Download URL: daylily_ephemeral_cluster-0.7.614-py3-none-any.whl
- Upload date:
- Size: 1.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de902f77e30b3df5d57bf39f18325eaacc881b93cd418504cde3c8f58ee8bb71
|
|
| MD5 |
6b8970a83ce83c38eabb58c03fd92f1c
|
|
| BLAKE2b-256 |
95953c66ce9cc34b738037b3e4edc19cbd03eed23ca94f3666ad68842b814537
|