Germline Filter is a Python program used in the ICGC-TCGA DREAM Mutation Calling Challenge meant to contribute to the real data security measures. It takes as input a preprocessed and encrypted germline calls file, as generated by GATK, and a somatic SNV vcf file, and it returns the number of germline calls found in the somatic vcf.
- The most important feature of the GermlineFilter is that the program runs in an encrypted fashion, making it safe for running on any server. All the filtering steps described in the flowchart are done at runtime, and at no point data is written on the disk. It has three options:
- encrypt_germline_vcf - Encrypt a truth germline vcf (preprocessing steps in the workflow above)
- filter - Filter a somatic vcf against an encrypted truth germline vcf. This step is done in an encrypted fashion.
- get_germline_positions - Get the actual germline positions called in a somatic vcf. This step is done in an unencrypted fashion, against the original truth germline vcf. It should only be run locally or on an encrypted server. The output is written to a tab delimited file.
- Multiple germline vcf’s can be preprocessed at the same time, with a common salt file and key file.
- Multiple somatic vcf’s corresponding to the same encrypted truth germline file can be filtered simultaneously. This considerably increases the speed versus individual runs.
- The user can choose the encryption protocol (AES, Blowfish); default AES
- The user can choose the hashing protocol (md5 or sha512); default sha512
- Get the actual germline position in a vcf, for plotting and further analysis.
After installation, to find out how to use the Germline filter, run:
For more examples, please take a look at the user manual, located in <path-to-dir>/GermlineFilter-1.2/doc
TODO: Brief introduction on what you do with files - including link to relevant help section.