Dagster pipeline for computing PRS reference distributions from the 1000G panel
Project description
prs-pipeline
Dagster pipeline for computing PRS reference distributions from the 1000 Genomes reference panel.
Overview
This pipeline downloads the PGS Catalog 1000G reference panel (~7 GB), computes polygenic risk scores
for all 2,504 reference individuals across all PGS Catalog scores, aggregates per-superpopulation
distribution statistics, and pushes reference_distributions.parquet to HuggingFace
(just-dna-seq/prs-percentiles).
End users of just-prs automatically pull this tiny parquet via PRSCatalog.reference_distributions().
Running
cd prs-pipeline
uv run dagster dev -m prs_pipeline.definitions
Then open http://localhost:3000 in your browser.
Assets
| Asset | Group | Description |
|---|---|---|
ebi_reference_panel_fingerprint |
download | HTTP fingerprint for freshness tracking of the remote reference panel |
ebi_scoring_files_fingerprint |
download | HTTP fingerprint for the remote scoring file manifest |
scoring_files |
download | Bulk-download all harmonized PGS scoring .txt.gz files from EBI FTP |
scoring_files_parquet |
compute | Convert all .txt.gz scoring files to spec-driven parquet caches (zstd-9, embedded headers). Deletes .txt.gz after verified conversion to save ~5.5 GB disk space. Tracks per-file failures in conversion_failures.parquet |
reference_panel |
download | Download + extract reference panel binary files (.pgen/.pvar/.psam) |
reference_scores |
compute | Score all PGS IDs against the reference panel via compute_reference_prs_batch() |
hf_prs_percentiles |
upload | Enrich distributions with metadata and push to HuggingFace |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file prs_pipeline-0.2.3.tar.gz.
File metadata
- Download URL: prs_pipeline-0.2.3.tar.gz
- Upload date:
- Size: 27.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5b5aa63d91120eca6e7185147b99e55855ba1c3706136ac4e566706b8563f8c
|
|
| MD5 |
ca022370bf2c97888f1a36a45746e252
|
|
| BLAKE2b-256 |
0865d22c3fddd2aa7f44ff4968a514e4baa820fbabb28cadbce812c3c35f1805
|
File details
Details for the file prs_pipeline-0.2.3-py3-none-any.whl.
File metadata
- Download URL: prs_pipeline-0.2.3-py3-none-any.whl
- Upload date:
- Size: 30.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65630d941d7be4b19e9b33de7558178582664f72d1de7e23bf3df54b9e945125
|
|
| MD5 |
640489f42898f6dd9df94783866975b7
|
|
| BLAKE2b-256 |
a356740158cc3c883397a529ccbcc17c08c5a108ded787913a5d01b1d32a560c
|