The eQTac method.

These details have not been verified by PyPI

Project links

Homepage

Project description

eQTac

EQTac is a method to predict the potential regulatory elements (PREs) and their target genes, based on the eQTL datasets, the only additional data was ATAC-seq or ChIP-seq peak data.

Schematic

Dependence

Python packages

numpy >= 1.21.6
pandas >= 1.2.3
pybedtools >= 0.8.1
pysam >= 0.15.3
rpy2 >= 3.5.11
scipy >= 1.7.3

Other software (need manual installation)

plink >= v1.90b6.24 (not plink2, plink should in $PATH)
bedtools >= v2.30.0 (bedtools should in $PATH)
R >= 3.6.1
    r-gkmSVM >= 0.8.0

Installation & test example

# installation
pip install eQTac 

# test examples
git clone https://github.com/JFF1594032292/eQTac.git # just for test
cd eQTac/Utilities_pipeline
nohup sh example_All_pipeline.sh &

Then it will generate an output_eQTac folder, which contained results file test.geno.vcf.gz.PRE_score.eQTac_result.FDR.txt. (example takes 3~5min)

Input data

Data used in model training:
1. Positive sets in bed format. It's usually the peak data from ATAC-seq or ChIP-seq, we recomended to trim peaks to the core region (e.g. summits $\pm$ 100bp). See test_data/test.positive.bed. <<<<<<< HEAD
2. Excluded sets in bed format. It's usually the peak data from ATAC-seq or ChIP-seq, but with more relaxed thresholds (e.g. p=0.2). These region will be removed from genrated negative regions, in order to remove potential positive sequences from negative sets. See test_data/test.exclude.bed. =======
3. Excluded sets in bed format. It's usually the peak data from ATAC-seq or ChIP-seq, but with more relaxed thresholds (e.g. p=0.2). These region will be removed from generated negative regions, in order to remove potential positive sequences from negative sets. See test_data/test.exclude.bed.

4b8bfa95564736c3bf45c48056ea656cf880d680 3. Fasta file with .fai index. Usually the human genome sequnce file in fasta format. See test_data/test.hg19.chr17.fa.

Data used in eQTac calculation.
1. PRE.bed. The candidate regions used to assess chromatin accessibility scores across different individuals and then calculate correlation with target genes. See test_data/test.pre.bed.
2. Genotype data in plink format. Individual genotype in eQTL datasets. See test_data/test.geno.bed, test_data/test.geno.bim, test_data/test.geno.fam.
3. Expression file. The expresion values are normalized expression values (see GTEx) and already corrected for covariates. See test_data/test.exp_residual.
4. Snplist file. SNP list file used in eQTac analysis. Note: only single nucleotide mutations. See test_data/test.geno.snplist.

Usage pattern

We provided three level patterns: (1) pipeline level. (2) part level. (3) function level.

Pipeline-level pattern

For the function level pattern, we provide a script: Part-All-eQTac_pipeline.py. It can be used as Utilities_pipeline/example_All_pipeline.sh:

python Part-All-eQTac_pipeline.py \
	-p test_data/test.positive.bed \
	-ex test_data/test.exclude.bed \
	-pre test_data/test.pre.bed \
	--geno test_data/test.geno \
	--snp test_data/test.geno.snplist \
	-fa test_data/test.hg19.chr17.fa \
	-exp test_data/test.exp_residual \
	-n 100 \
	-o output_eQTac \
	-t 3 -l 10 -k 6 -c 10 -g 2 -e 0.01

Part-level pattern

For the function level pattern, we provide four scripts:

Part-1-Train_model.py
Part-2-Generate_PRE_fa.py
Part-3-Predict_PRE_score.py
Part-4-Calculate_eQTac_correlation.py

It can be used as Utilities_pipeline/example_Part_pipeline.sh:

python Part-1-Train_model.py \
	-p test_data/test.positive.bed \
	-ex test_data/test.exclude.bed \
	-o output_eQTac_part \
	-t 3 -l 10 -k 6 -c 10 -g 2 -e 0.01

python Part-2-Generate_PRE_fa.py \
	-pre test_data/test.pre.bed \
	--geno test_data/test.geno \
	--snp test_data/test.geno.snplist \
	-fa test_data/test.hg19.chr17.fa \
	-o output_eQTac_part

python Part-3-Predict_PRE_score.py \
	-m output_eQTac_part/test.positive.pos.svmmodel.3_10_6_0.01.model.txt \
	-l output_eQTac_part/test.geno.snplist.bed--test.pre.bed.pre_snplist.ld_info \
	-mfa output_eQTac_part/test.geno.snplist.bed--test.pre.bed.pre_snplist.ld_info.snplist.bed.mutate.fa \
	-geno test_data/test.geno \
	-snp output_eQTac_part/test.geno.snplist.bed--test.pre.bed.pre_snplist \
	-T 1 \
	-o output_eQTac_part

python Part-4-Calculate_eQTac_correlation.py \
	-pre output_eQTac_part/test.geno.vcf.gz.PRE_score \
	-exp test_data/test.exp_residual \
	-n 50 \
	-o output_eQTac_part

Function-level pattern

For the function level pattern, we provide a series of functions:

from eQTac.get_nullseq import get_nullseq
from eQTac.filter_bkg import filter_bkg
from eQTac.generate_snp_dict import generate_snp_dict
from eQTac.generate_PRE import generate_PRE
from eQTac.generate_mut_fa import generate_mut_fa
from eQTac.geno2score import geno2score
from eQTac.eQTac_correlation import eQTac_correlation
from eQTac.eQTac_permutation import eQTac_permutation
from eQTac.control_FDR import control_FDR

These functions can be used to construct the whole pipeline.

Recomend

We recomend to use the pipeline-level pattern at first to make sure that all input formats are valid.

Then use the part-level pattern to debug parameters. (e.g. training a best performance model). The first step is the most time-consuming step, we recomended to use the part-level pattern to save the SVM model xxx.svmmodel.3_10_6_0.01.model.txt.

If you are familiar with this pipeline, you can directly use the function-level pattern to construct your own pipeline.

Notes

The test result is very volatile, because of the small size of test dataset (only ~6MB length of sequences). The results will be stable with tens of thousands or more peaks used as positive set.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.0.17

Jul 29, 2024

1.0.16

Feb 21, 2024

1.0.15

Feb 21, 2024

1.0.14

Feb 20, 2024

1.0.13

Feb 20, 2024

1.0.12

Sep 24, 2023

This version

1.0.11

Sep 22, 2023

1.0.10

Jul 28, 2023

1.0.9

Jul 28, 2023

1.0.8

Apr 20, 2023

1.0.7

Apr 20, 2023

1.0.6

Apr 20, 2023

1.0.5

Apr 19, 2023

1.0.4

Apr 19, 2023

1.0.3

Apr 19, 2023

1.0.2

Apr 19, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eQTac-1.0.11.tar.gz (13.7 kB view details)

Uploaded Sep 22, 2023 Source

Built Distribution

eQTac-1.0.11-py3-none-any.whl (15.7 kB view details)

Uploaded Sep 22, 2023 Python 3

File details

Details for the file eQTac-1.0.11.tar.gz.

File metadata

Download URL: eQTac-1.0.11.tar.gz
Upload date: Sep 22, 2023
Size: 13.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for eQTac-1.0.11.tar.gz
Algorithm	Hash digest
SHA256	`44865213470d135b58f82392ec61e9a5f208267428d87220df95f1f2d279fba4`
MD5	`ba803e14c9715f273281399d527c16cb`
BLAKE2b-256	`f7e9e20b8bbc2b355bdc80a67996c591074b973331a8a59cfc7718d637e7875a`

See more details on using hashes here.

File details

Details for the file eQTac-1.0.11-py3-none-any.whl.

File metadata

Download URL: eQTac-1.0.11-py3-none-any.whl
Upload date: Sep 22, 2023
Size: 15.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for eQTac-1.0.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c52a322627f4deb6048ace87349720031ba53acc133453ab03c559b2886c1b96`
MD5	`d504e7616fe9e6e5851a6be6b75ec354`
BLAKE2b-256	`cbc44a0bcde2c11678b62b20725d2189d94fde5b0e897b53d8d045c45e517d18`