Various tools and scripts used in the GHFC lab
Project description
ghfc-utils
set of small tools designed to help automatize simple task locally or on Pasteur's cluster.
- ghfc-reannotate for the postprocessing of slivar files including filtering and geneset reannotation.
Installation
pip install ghfc-utils
PS. on maestro, do not forget to load Python first (not needed anymore once installed):
module load Python/3.9.16
slivar reannotator
A tool to filter and reannotate slivar files according to various parameters and genesets. The goal is to produce a more generic kind of slivar files and to use this for the user to run their own filtering. ghfc-reannotate reads a slivar file and given a config file, will return an annotated and filtered slivar file.
usage: ghfc-reannotate [-h] configuration slivar output
positional arguments:
configuration config file
slivar slivar file to reannotate
output annotated slivar file
optional arguments:
-h, --help show this help message and exit
--chunksize CHUNKSIZE
size of the chunks read from the input (default 100000)
-k, --keep-all-transcripts
to keep all impacted transcript instead of the first
Example
You need to prepare a config file or take the one provided here. Then, you need a slivar file to run the tool on. For instance you can use the 570MB file at /pasteur/zeus/projets/p02/ghfc_wgs_zeus/WGS/Paris-AIMS2/slivar/Paris-AIMS2.slivar-full.lof_miss.tsv on zeus.
ghfc-reannotate config.yaml input_slivar.tsv output_slivar.tsv
The provided config file has a lot of comments to help build a new one.
how is it working?
- This tools read the slivar file before decomposing the impacts by transcripts (so 1 line per trancsript).
- as slivar files can be really large in some project, the reading is done by chinks of 100k rows.
- Then, it filters all lines using, in this order following the config file parameters:
- the geneset (based on the ENSG, mind the GRCh37/GRCh38 differences in ENSG)
- the impact / impact-categories
- if missense are kept, filtering them on their impact (using scores such as the mpc or the cadd)
- the gnomad frequency
- the pext
- the variant/transcript are then sorted according to criteria given by the user in the config file from the most important to the least important
- for each sample, variant and gene (ENSG) the first transcript (most important given by the config criteria) is kept unless the --keep-all-transcripts option is used.
pext file
The pext is a bed file with the following columns (order important, there must be some header):
chr start end max_brain ensg symbol
Need to have the genome version to match the data (GRCh37/38 and using the chr or not in the chromosome names)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ghfc_utils-0.0.6.tar.gz.
File metadata
- Download URL: ghfc_utils-0.0.6.tar.gz
- Upload date:
- Size: 8.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4c14dc41b950451af554a11b13c0796a0cd52e1b695b993c088a79ac14730565
|
|
| MD5 |
96cda68b4cde8ab8b96689543a2cb97c
|
|
| BLAKE2b-256 |
379eb522a48e6d91fd77b6eaa02248d965d2f4f373c78f6292b2e1f0eea8afec
|
File details
Details for the file ghfc_utils-0.0.6-py3-none-any.whl.
File metadata
- Download URL: ghfc_utils-0.0.6-py3-none-any.whl
- Upload date:
- Size: 7.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81e652e3302a1dc70d4bf1f1c79ad1c4221d7adda968b45e85041120bcd7c3ea
|
|
| MD5 |
3366d920169fb9bd6874256fb9cdec3d
|
|
| BLAKE2b-256 |
7e47d0af3a85f7ecb2fbc9de914e97c6f1f28f230e5073d829e32e97520c9638
|