Passphrase-based encryption for genomic files. Wraps the GA4GH Crypt4GH standard with deterministic keypair derivation so users never manage key files.
Project description
genomevault
Passphrase-based encryption for genomic files. No key files to manage, no PKI to set up — just a passphrase and your data. Output is the GA4GH-standard Crypt4GH file format wrapped with a small header for passphrase-driven decryption.
[!IMPORTANT] Your passphrase is your only key. Lose the passphrase, lose the data. There is no recovery path — by design. This is the trade for not having to manage key files.
Why this exists
The genomics community has a solid encryption standard — Crypt4GH — but its UX assumes you can manage X25519 keypair files. Most people can't, don't want to, and shouldn't have to. genomevault closes that gap: a passphrase alone encrypts and decrypts your .vcf, .fastq, .bam, or .gvcf files, and the output remains interoperable with every other Crypt4GH tool.
Install
pip install genomevault
Requires Python 3.10 or newer.
Quickstart
# Encrypt a VCF
genomevault encrypt my-genome.vcf
# → produces my-genome.vcf.gvf
# Decrypt it
genomevault decrypt my-genome.vcf.gvf
# → restores my-genome.vcf
# Check a file without decrypting
genomevault verify my-genome.vcf.gvf
genomevault info my-genome.vcf.gvf
# Extract to standard Crypt4GH + key (for interop with other Crypt4GH tools)
genomevault extract my-genome.vcf.gvf
Passphrase advice
Length beats complexity. A four-word phrase like correct horse battery staple is stronger than a short complex password like P@ssw0rd! — and easier to remember. The tool will warn you if your passphrase is short; follow its guidance.
Good passphrases:
the glass wall opens twice each visitmy grandmother kept her recipes in a tin boxsequence once destroy twice keys with me always
Bad passphrases:
passwordgenome123Spring2026!
How it works (2-minute version)
- You supply a passphrase.
genomevaultrunsscrypt(passphrase, random-salt, N=2^20, r=8, p=1)to derive a 32-byte seed.- The seed becomes a deterministic X25519 keypair (via libsodium's
crypto_sign_seed_keypairconversion). - Your file is encrypted to the derived public key using standard Crypt4GH (ChaCha20-Poly1305, random session key per file).
- The salt + scrypt parameters are stored in a small header prepended to the Crypt4GH output — this is the
.gvfformat. - To decrypt, the passphrase + salt + same scrypt parameters reproduce the private key, which unlocks the Crypt4GH header.
File formats
.gvf— GenomeVault-native. Contains the salt + KDF parameters + standard Crypt4GH payload. Use this for routine storage..c4gh— Standard Crypt4GH. Usegenomevault extractto produce this plus the derived keypair if you need to share the file with someone using a different Crypt4GH tool.
Security notes
- Authenticated encryption — ChaCha20-Poly1305 AEAD (via Crypt4GH). Any tampering with the ciphertext is detected and decryption refuses.
- Passphrase strength is your ceiling — the tool uses scrypt with N=2^20 (≈1 second per guess on modern hardware), so brute force requires enormous resources, but a weak passphrase is still weak. Use long phrases.
- No revocation — if your passphrase is compromised, you cannot "revoke" the derived keypair. You must re-encrypt all files with a new passphrase. This differs from random keypair workflows (where you can rotate keys).
- No recovery — if you forget the passphrase, there is no backdoor. Consider a password manager entry or a physically-secure paper backup for life-critical data.
Full details in docs/DESIGN.md. Disclosure policy in SECURITY.md.
Related tools
- Crypt4GH — the underlying standard, for anyone who wants to manage keypair files directly.
- crypt4gh-gui — GUI wrapper on top of Crypt4GH, also keyfile-based.
- age — general-purpose modern file encryption; inspired our UX but is not genomic-data-aware.
License
MIT — see LICENSE.
Contributing
Issues and pull requests welcome. Before submitting a PR, please run:
pip install -e ".[dev]"
pytest
ruff check .
mypy src
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file genomevault-0.1.0.tar.gz.
File metadata
- Download URL: genomevault-0.1.0.tar.gz
- Upload date:
- Size: 27.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ed4c1e053358e43efee74591d0d5833ff3c2f1ebe04500eaca2b4ef09db01752
|
|
| MD5 |
66ae76edda04808e7582e9dcb0485328
|
|
| BLAKE2b-256 |
a44f16f6cfb5aa3ea41e3d9cafbc105a2908bf047f528fb0a526d524c24ef324
|
File details
Details for the file genomevault-0.1.0-py3-none-any.whl.
File metadata
- Download URL: genomevault-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
709c7b7988badf532d7d3fcc927317c049ccab49b6267b5a0cd17d29b04f32d6
|
|
| MD5 |
792000749c6d9ea11fd1d645e65b64cc
|
|
| BLAKE2b-256 |
eacb692c03e2444dd1d01aa126ff8a741f9778ea901e414c5a0888787d7de727
|