Associate outliers with rare variation
Project description
Cursory use of ORE (outlier-RV enrichment) is provided here, visit the latest ORE documentation for more details. Confirm the following are installed:
Then, on the command line, install with
pip install ore
Example run
ore --vcf test.vcf.gz \
--bed test.bed.gz \
--output ore_results \
--distribution normal \
--threshold 2 3 4 \
--max_outliers_per_id 500 \
--af_rare 0.05 0.01 1e-3 \
--tss_dist 5000
Variants and gene expression are specified with --vcf
(line 1) and --bed
(line 2), respectively. The output prefix is provided with --output
(line 3). In this example, the outlier specifications --distribution
(line 4), --threshold
(line 5), and --max_outliers_per_id
(line 6) indicate that outliers are defined using a normal distribution with a z-score more extreme than two, and samples with more than 500 outliers are excluded. Variant information is specified with --af_rare
(line 7) and --tss_dist
(line 8) to encode that variants are defined as rare with a intra-cohort allele frequency at varying thresholds (≤ 0.05, 0.01, and 0.001), and to only use variants within 5 kb of the TSS.
Usage, visit the latest ORE documentation for more
ore [-h] [--version] -v VCF -b BED [-o OUTPUT]
[--outlier_output OUTLIER_OUTPUT] [--enrich_file ENRICH_FILE]
[--extrema] [--distribution {normal,rank,custom}]
[--threshold [THRESHOLD [THRESHOLD ...]]]
[--max_outliers_per_id MAX_OUTLIERS_PER_ID]
[--af_rare [AF_RARE [AF_RARE ...]]] [--af_vcf]
[--intracohort_rare_ac INTRACOHORT_RARE_AC] [--gq GQ] [--dp DP]
[--aar AAR AAR] [--tss_dist [TSS_DIST [TSS_DIST ...]]] [--upstream]
[--downstream] [--annovar]
[--variant_class {intronic,intergenic,exonic,UTR5,UTR3,splicing,upstream,ncRNA,ncRNA_exonic}]
[--exon_class {nonsynonymous,intergenic,nonframeshift,frameshift,stopgain,stoploss}]
[--refgene] [--ensgene] [--annovar_dir ANNOVAR_DIR]
[--humandb_dir HUMANDB_DIR] [--processes PROCESSES] [--clean_run]
- Required arguments:
- -v VCF, --vcf VCF
Location of VCF file. Must be tabixed!
- -b BED, --bed BED
Gene expression file location. Must be tabixed!
- Optional file locations:
- -o OUTPUT, --output OUTPUT
Output prefix (default is VCF prefix)
- --outlier_output OUTLIER_OUTPUT
Outlier filename (default is VCF prefix)
- --enrich_file ENRICH_FILE
Output file for enrichment odds ratios and p-values (default is VCF prefix)
- Optional outlier arguments:
- --extrema
Only the most extreme value is an outlier
- --distribution DISTRIBUTION
Outlier distribution. Options: {normal,rank,custom}
- --threshold THRESHOLD
Expression threshold for defining outliers. Must be greater than 0 for normal or (0,0.5) non-inclusive with rank. Ignored with custom
- --max_outliers_per_id MAX_OUTLIERS_PER_ID
Maximum number of outliers per ID
- Optional variant-related arguments:
- --af_rare AF_RARE
AF cut-off below which a variant is considered rare (space separated list e.g., 0.1 0.05)
- --af_vcf
Use the VCF AF field to define an allele as rare.
- --intracohort_rare_ac INTRACOHORT_RARE_AC
Allele COUNT to be used instead of intra-cohort allele frequency. (still uses af_rare for population level AF cut-off)
- --af_min AF_MIN
Lower bound on AF cut-offs for –af_rare, must be same length as –af_rare (e.g., with –af_rare 0.01 0.5 and –af_min 0 0.05 ORE will compare variants within [0,0.01] and [0.05,0.5] to other variants).
- --gq GQ
Minimum genotype quality each variant in each individual
- --dp DP
Minimum depth per variant in each individual
- --aar AAR
Alternate allelic ratio for heterozygous variants (provide two space-separated numbers between 0 and 1, e.g., 0.2 0.8)
- --tss_dist TSS_DIST
Variants within this distance of the TSS are considered
- --upstream
Only variants UPstream of TSS
- --downstream
Only variants DOWNstream of TSS
- Optional arguments for using ANNOVAR:
- --annovar
Use ANNOVAR to specify allele frequencies and functional class
- --variant_class
Only variants in these classes will be considered. Options: {intronic,intergenic,exonic,UTR5,UTR3,splicing,upstream,ncRNA}
- --exon_class
Only variants with these exonic impacts will be considered. Options: {nonsynonymous,intergenic,nonframeshift,frameshift,stopgain,stoploss}
- --refgene
Filter on RefGene function.
- --ensgene
Filter on ENSEMBL function.
- --annovar_dir ANNOVAR_DIR
Directory of the table_annovar.pl script
- --humandb_dir HUMANDB_DIR
Directory of ANNOVAR data (refGene, ensGene, and gnomad_genome)
- optional arguments:
- -h, --help
show this help message and exit
- --version
show program’s version number and exit
- --processes PROCESSES
Number of CPU processes
- --clean_run
Delete temporary files from the previous run
Felix Richter <felix.richter@icahn.mssm.edu>
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ore-0.2.1.tar.gz
.
File metadata
- Download URL: ore-0.2.1.tar.gz
- Upload date:
- Size: 236.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.19.8 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4c4a6682f0a2ce15605e21a3d8da5a50ffd160701bff28a54e620fb8dae876f5 |
|
MD5 | 58cdc87b27f556c0e56e80fde0668657 |
|
BLAKE2b-256 | c9be7c23284c391a724d0f8514f59e9dddc6c9e9c200fb2b11a3de9c7d3245f4 |
File details
Details for the file ore-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: ore-0.2.1-py3-none-any.whl
- Upload date:
- Size: 11.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.19.8 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fb710e028e907a93e11bdd7f8068c7f4564b2ef4703fd6e5d17b8f12e4386ff9 |
|
MD5 | 498d8beb44026179b72d283831e19cdc |
|
BLAKE2b-256 | fdde66a1c8267ac53bbd1a31c71fb992c0bd8fc6784f7c9f03954d20dea47536 |