Skip to main content

Passphrase-based encryption for genomic files. Wraps the GA4GH Crypt4GH standard with deterministic keypair derivation so users never manage key files.

Project description

genomevault

Passphrase-based encryption for genomic files. No key files to manage, no PKI to set up — just a passphrase and your data. Output is the GA4GH-standard Crypt4GH file format wrapped with a small header for passphrase-driven decryption.

[!IMPORTANT] Your passphrase is your only key. Lose the passphrase, lose the data. There is no recovery path — by design. This is the trade for not having to manage key files.

Why this exists

The genomics community has a solid encryption standard — Crypt4GH — but its UX assumes you can manage X25519 keypair files. Most people can't, don't want to, and shouldn't have to. genomevault closes that gap: a passphrase alone encrypts and decrypts your .vcf, .fastq, .bam, or .gvcf files, and the output remains interoperable with every other Crypt4GH tool.

Install

pip install genomevault

Requires Python 3.10 or newer.

Quickstart

# Encrypt a VCF
genomevault encrypt my-genome.vcf
# → produces my-genome.vcf.gvf

# Decrypt it
genomevault decrypt my-genome.vcf.gvf
# → restores my-genome.vcf

# Check a file without decrypting
genomevault verify my-genome.vcf.gvf
genomevault info my-genome.vcf.gvf

# Extract to standard Crypt4GH + key (for interop with other Crypt4GH tools)
genomevault extract my-genome.vcf.gvf

Passphrase advice

Length beats complexity. A four-word phrase like correct horse battery staple is stronger than a short complex password like P@ssw0rd! — and easier to remember. The tool will warn you if your passphrase is short; follow its guidance.

Good passphrases:

  • the glass wall opens twice each visit
  • my grandmother kept her recipes in a tin box
  • sequence once destroy twice keys with me always

Bad passphrases:

  • password
  • genome123
  • Spring2026!

How it works (2-minute version)

  1. You supply a passphrase.
  2. genomevault runs scrypt(passphrase, random-salt, N=2^20, r=8, p=1) to derive a 32-byte seed.
  3. The seed becomes a deterministic X25519 keypair (via libsodium's crypto_sign_seed_keypair conversion).
  4. Your file is encrypted to the derived public key using standard Crypt4GH (ChaCha20-Poly1305, random session key per file).
  5. The salt + scrypt parameters are stored in a small header prepended to the Crypt4GH output — this is the .gvf format.
  6. To decrypt, the passphrase + salt + same scrypt parameters reproduce the private key, which unlocks the Crypt4GH header.

File formats

  • .gvf — GenomeVault-native. Contains the salt + KDF parameters + standard Crypt4GH payload. Use this for routine storage.
  • .c4gh — Standard Crypt4GH. Use genomevault extract to produce this plus the derived keypair if you need to share the file with someone using a different Crypt4GH tool.

Security notes

  • Authenticated encryption — ChaCha20-Poly1305 AEAD (via Crypt4GH). Any tampering with the ciphertext is detected and decryption refuses.
  • Passphrase strength is your ceiling — the tool uses scrypt with N=2^20 (≈1 second per guess on modern hardware), so brute force requires enormous resources, but a weak passphrase is still weak. Use long phrases.
  • No revocation — if your passphrase is compromised, you cannot "revoke" the derived keypair. You must re-encrypt all files with a new passphrase. This differs from random keypair workflows (where you can rotate keys).
  • No recovery — if you forget the passphrase, there is no backdoor. Consider a password manager entry or a physically-secure paper backup for life-critical data.

Full details in docs/DESIGN.md. Disclosure policy in SECURITY.md.

Related tools

  • Crypt4GH — the underlying standard, for anyone who wants to manage keypair files directly.
  • crypt4gh-gui — GUI wrapper on top of Crypt4GH, also keyfile-based.
  • age — general-purpose modern file encryption; inspired our UX but is not genomic-data-aware.

License

MIT — see LICENSE.

Contributing

Issues and pull requests welcome. Before submitting a PR, please run:

pip install -e ".[dev]"
pytest
ruff check .
mypy src

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genomevault-0.1.0.tar.gz (27.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

genomevault-0.1.0-py3-none-any.whl (21.1 kB view details)

Uploaded Python 3

File details

Details for the file genomevault-0.1.0.tar.gz.

File metadata

  • Download URL: genomevault-0.1.0.tar.gz
  • Upload date:
  • Size: 27.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for genomevault-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ed4c1e053358e43efee74591d0d5833ff3c2f1ebe04500eaca2b4ef09db01752
MD5 66ae76edda04808e7582e9dcb0485328
BLAKE2b-256 a44f16f6cfb5aa3ea41e3d9cafbc105a2908bf047f528fb0a526d524c24ef324

See more details on using hashes here.

File details

Details for the file genomevault-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: genomevault-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for genomevault-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 709c7b7988badf532d7d3fcc927317c049ccab49b6267b5a0cd17d29b04f32d6
MD5 792000749c6d9ea11fd1d645e65b64cc
BLAKE2b-256 eacb692c03e2444dd1d01aa126ff8a741f9778ea901e414c5a0888787d7de727

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page