Biobanking data processing, annotation, and association workflows
Project description
Biobanking
Systematic collection, processing, storage, and analysis of biological samples and associated health records for medical research.
Supported pipelines
Preprocess
Contains biobank-specific modules for EHR data collection, cleaning, and processing.
QC (Under construction)
Will contain biobank-specific modules for variant quality control and filtering.
Annotation (Under construction)
Will contain biobank-specific modules for variant annotation.
Association
Contains biobank-specific modules for genotype-phenotype association tests.
Supported biobanks
All of Us
The All of Us biobank consists of coupled whole genome sequencing and electronic health record data of more than 400k individuals, with continued expansion.
UK Biobank (Under construction)
The UK Biobank consists of coupled whole genome sequencing and electronic health record data of ~500k participants.
AoU REGENIE workflow
The All of Us association utilities currently support a packaged regenie workflow with three Step 2 modes:
- Burden association testing
- Mask-only runs for writing burden-mask PLINK datasets
- Interaction testing using the same burden inputs and optional interaction flags
The workflow implementation lives in src/biobanking/workflows/regenie.wdl, and the Python utilities live in src/biobanking/association/aou.py.
The tracking model is phenotype-centered:
- Step 1 is tracked once per phenotype prefix
- Step 2 runs are tracked separately by mode
- workflow metadata is written locally and synced to the workspace bucket
This keeps LOCO and prediction reuse aligned with the phenotype definition rather than with any specific burden or interaction run.
Recommended usage pattern
- Run or reuse Step 1 once per phenotype prefix.
- Use burden runs for standard gene-based tests.
- Use mask runs to materialize chromosome-wide or gene-specific burden-mask PLINK files.
- Use interaction runs only after Step 1 exists for the phenotype prefix you are testing.
More detailed usage examples are in docs/workflows.md.
Internal use
python -m pip install -U pip build
pip install twine
# linux
rm -rf dist build *.egg-info src/*.egg-info
# windows
Remove-Item -Recurse -Force dist, *.egg-info, src\*.egg-info
python -m build
pip install dist/biobanking-0.0.12-py3-none-any.whl
python -c "from biobanking.association.aou import REGENIE; regenie = REGENIE(); from biobanking.preprocess.aou.measurements import save_measurements_in_wide_format; print('import ok')"
twine upload dist/*
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file biobanking-0.0.12.tar.gz.
File metadata
- Download URL: biobanking-0.0.12.tar.gz
- Upload date:
- Size: 283.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1deef5bd140f1e9091a104e2cbb2ed07bb3dd287d91b925987951445fcedccf
|
|
| MD5 |
f702ffabe34d4b992d4b1a42cfc76fc6
|
|
| BLAKE2b-256 |
258f2c20ffcfddd85120efc5aed188a4de04ee20b811ea9217dfc3577a220a0c
|
File details
Details for the file biobanking-0.0.12-py3-none-any.whl.
File metadata
- Download URL: biobanking-0.0.12-py3-none-any.whl
- Upload date:
- Size: 328.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2aebe5049d339468c3b14ce1d2539e826fd61fdb1a59a579f09bc5ec51dbae7
|
|
| MD5 |
8ca271b11d9c06b42c927080f9ec78b7
|
|
| BLAKE2b-256 |
971b775ed6ac90715e67162fe9cceedc36d84762072af07e87643e06ff99860b
|