A utility to annotate genomic intervals.
Project description
***************
region_analysis
***************
Dependency:
###########
bedtools: https://code.google.com/p/bedtools/
pybedtools: https://github.com/daler/pybedtools
::
If easy_install or pip is available, then:
easy_install pybedtools
or:
pip isntall pybedtools
Usage:
######
region\_analysis.py [options]
Options:
########
-h, --help show this help message and exit
-i INPUT\_FILE\_NAME, --input=INPUT\_FILE\_NAME
::
Input region file must assume the first 3 columns contain (chr, start, end)
-d ANNO\_DB, --database=ANNO\_DB
::
Choose database: refseq(default) or ensembl
-r, --rhead Whether the input file contains column header
-g GENOME, --genome=GENOME
::
Choose genome: mm10(default)
Output:
#######
*-.annotated: the one-to-one output list, only the annotation entry whose TSS is nearest to the inquiry interval kept.
*-.full.annotated: all hit entries are kept.
*-.full.annotated.json: the json format output of -.full.annotated.
Features:
#########
*ProximalPromoter: +/- 250bp of TSS
*Promoter1k: +/- 1kbp of TSS
*Promoter3k: +/- 3kbp of TSS
*Genebody: Anywhere between a gene's promoter and up to 1kbp downstream of the TES.
*Genedeserts: Genomic regions that are depleted with genes and are at least 1Mbp long.
*Pericentromere: Between the boundary of a centromere and the closest gene minus 10kbp of that gene's regulatory region.
*Subtelomere: Similary defined as pericentromere.
*OtherIntergenic: Any region that does not belong to the above categories.
Testing with examples:
######################
region\_analysis.py -i example/test\_without\_header.bed -g mm10 -d ensembl
region\_analysis.py -i example/test\_with\_header.bed -g mm10 -d ensembl -r
region_analysis
***************
Dependency:
###########
bedtools: https://code.google.com/p/bedtools/
pybedtools: https://github.com/daler/pybedtools
::
If easy_install or pip is available, then:
easy_install pybedtools
or:
pip isntall pybedtools
Usage:
######
region\_analysis.py [options]
Options:
########
-h, --help show this help message and exit
-i INPUT\_FILE\_NAME, --input=INPUT\_FILE\_NAME
::
Input region file must assume the first 3 columns contain (chr, start, end)
-d ANNO\_DB, --database=ANNO\_DB
::
Choose database: refseq(default) or ensembl
-r, --rhead Whether the input file contains column header
-g GENOME, --genome=GENOME
::
Choose genome: mm10(default)
Output:
#######
*-.annotated: the one-to-one output list, only the annotation entry whose TSS is nearest to the inquiry interval kept.
*-.full.annotated: all hit entries are kept.
*-.full.annotated.json: the json format output of -.full.annotated.
Features:
#########
*ProximalPromoter: +/- 250bp of TSS
*Promoter1k: +/- 1kbp of TSS
*Promoter3k: +/- 3kbp of TSS
*Genebody: Anywhere between a gene's promoter and up to 1kbp downstream of the TES.
*Genedeserts: Genomic regions that are depleted with genes and are at least 1Mbp long.
*Pericentromere: Between the boundary of a centromere and the closest gene minus 10kbp of that gene's regulatory region.
*Subtelomere: Similary defined as pericentromere.
*OtherIntergenic: Any region that does not belong to the above categories.
Testing with examples:
######################
region\_analysis.py -i example/test\_without\_header.bed -g mm10 -d ensembl
region\_analysis.py -i example/test\_with\_header.bed -g mm10 -d ensembl -r
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
regionanalysis-0.1.tar.gz
(15.9 MB
view hashes)