Various tools and scripts used in the GHFC lab

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

ghfc-utils

set of small tools designed to help automatize simple task locally or on Pasteur's cluster.

ghfc-reannotate for the postprocessing of slivar files including filtering and geneset reannotation.

Installation

pip install git+https://{token_username}:{generated_token}@gitlab.pasteur.fr/ghfc/ghfc-utils.git

when on maestro:

module load Python/3.9.18
pip install --user git+https://maestro_tok_FC:rSUDdAJWZutsZTJseJ3V@gitlab.pasteur.fr/ghfc/ghfc-utils.git

slivar reannotator

A tool to filter and reannotate slivar files according to various parameters and genesets. The goal is to produce a more generic kind of slivar files and to use this for the user to run their own filtering.

usage: ghfc-reannotate [-h] configuration slivar output

positional arguments:
  configuration         config file
  slivar                slivar file to reannotate
  output                annotated slivar file

optional arguments:
  -h, --help            show this help message and exit
  --chunksize CHUNKSIZE
                        size of the chunks read from the input (default 100000)

This tools read the slivar file before decomposing the impacts by transcripts.
It then filters all line using, in this order following the config file parameters:
1. the geneset (based on the ENSG, mind the GRCh37/GRCh38 differences in ENSG)
2. the impact / impact-categories
3. if missense are kept, filtering them on their impact (using scores such as the mpc or the cadd)
4. the gnomad frequency
the variant/transcript are then sorted according to criteria given by the user in the config file from the most important to the least important
for each sample, variant and gene (ENSG) the first transcript (most important given the config criteria) is kept.

yaml config file

This section is listing the accepted options in the config file, but example files are provided.

(optional) geneset-file: path to the file containing the list of ENSG for out geneset of interest
ordering-priority: the list (ordered) of criteria to use to rank the importance of the transcripts to output
impact-categories-filter: the impact categories to keep during the filtering. The categories are defined in this package in impacts.yaml.
impact-filter: the impacts to keep during the filtering process. the name of the impacts are visible in impacts.yaml.
(optional) missense-filter: this section will define how the missense variant are further filtered
- first a subcategory is used for the score used, e.g. mpc, cadd
- for each subcategory, 3 fields are expected:
  - field: the name of the slivar column containing the value
  - min: the minimal value to keep (included)
  - max: the maximal value to keep (excluded)
- in addition to the subcategories, a condition field is expected to specify how the subcategories are used. Possible values are:
  - cadd_if_no_mpc: use the mpc and when not available (-1) uses the cadd
  - cadd_and_mpc
  - cadd_or_mpc
  - mpc_only
  - cadd_only
(optional) gnomad-filter: to filter further on an included gnomAD column. 3 fields are expected here:
- field: the name of the slivar column containing the gnomad value to filter on
- min: the minimal value to keep (included)
- max: the maximal value to keep (included)
(optional) pext-filter: to annotate each transcript and filter them using a pext file:
- file: path to the pext file to use
- field: nme of the outputed column
- min: the minimal value to keep (included). can put -1 to annotate and not filter on it.

Finally, some more global slivar parameters that are not likely to change a lot:

slivar-field-name: the name of the slivar column that contains the list of all vep impacts per transcripts
slivar-field-decomposed: the list of each field when they are decomposed. some of those fields are expected with the following names:
- impact
- ENSG
- canonical: the vep columns containing "YES" for the canonical transcripts
- loftee: the loftee "LoF" column

pext file

The pext is a bed file with the following columns (order important, there must be some header):

chr	start	end	max_brain	ensg	symbol

Need to have the genome version to match the data (GRCh37/38 and using the chr or not in the chromosome names)

TODO

offer to prefix geneset columns?
offer to keep some of the original columns (impact/transcript)
possibility to run on stdin / stdout?
refining DP and AB
needs for automated submission on the cluster? (means user has permission to use it)
possibility to automate splitting in chunks and merging back?

slivar de novo ML

Moving the machine learning validator for de novo variants to this tool.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.1.14

Oct 28, 2025

0.1.13

Feb 21, 2025

0.1.12

Feb 21, 2025

0.1.11

Feb 21, 2025

0.1.10

Feb 21, 2025

0.1.9

Feb 21, 2025

0.1.8

Feb 21, 2025

0.1.7

Feb 20, 2025

0.1.6

Feb 20, 2025

0.1.5

Feb 20, 2025

0.1.4

Feb 20, 2025

0.1.3

Feb 20, 2025

0.1.2

Feb 20, 2025

0.1.1

Feb 18, 2025

0.1.0

Feb 18, 2025

0.0.13

Jan 9, 2025

0.0.12

Jan 9, 2025

0.0.11

Jul 18, 2024

0.0.10

Jul 17, 2024

0.0.9

Jul 17, 2024

0.0.8

Jul 15, 2024

0.0.7

Jul 9, 2024

0.0.6

Jun 13, 2024

0.0.5

May 14, 2024

0.0.4

May 14, 2024

0.0.3

May 14, 2024

0.0.2

May 14, 2024

This version

0.0.1

May 14, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ghfc_utils-0.0.1.tar.gz (8.4 kB view details)

Uploaded May 14, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ghfc_utils-0.0.1-py3-none-any.whl (7.2 kB view details)

Uploaded May 14, 2024 Python 3

File details

Details for the file ghfc_utils-0.0.1.tar.gz.

File metadata

Download URL: ghfc_utils-0.0.1.tar.gz
Upload date: May 14, 2024
Size: 8.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.9.6

File hashes

Hashes for ghfc_utils-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`8dd214a06852934b12597d8183622b2d8abc9ef71ec20ceaeea3cabb7a7f1e8c`
MD5	`d0304fbc715865e7b15999629bcf8f23`
BLAKE2b-256	`68fff2c9b4a3234b2caed41b3a8fab5efa173d843687931e58187c3dbad44c09`

See more details on using hashes here.

File details

Details for the file ghfc_utils-0.0.1-py3-none-any.whl.

File metadata

Download URL: ghfc_utils-0.0.1-py3-none-any.whl
Upload date: May 14, 2024
Size: 7.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.9.6

File hashes

Hashes for ghfc_utils-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cf68b554d7b14688b22993285ab78959996d45260c16d387c259fbbc1615e78d`
MD5	`919264b4ddddbc3c0c380add4a6507f1`
BLAKE2b-256	`1e1852fc9aa62756f62b544b6ce3c7a915f42cb34e75b0526379053028468b6d`

See more details on using hashes here.

ghfc-utils 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ghfc-utils

Installation

slivar reannotator

yaml config file

pext file

TODO

slivar de novo ML

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes