Skip to main content

EpiOut: outlier detection for DNA accesibility data.

Project description

EpiOut

pypi tests codecov Documentation Status

method

Install

Install epiout and its companion packages with:

pip install epiout

hic-straw is optional dependency to annotate chromatin interactions with EpiAnnot:

conda install -c bioconda hic-straw

or

conda install -c conda-forge curl
pip install hic-straw

and another optional dependency is onnxruntime to predict aberrant gene expression from aberrant chromatin accessibility:

pip install onnxruntime

Usage

Counting chromatin accessibility from ATAC-seq data with EpiCount:

epicount --bed {bed} --alignments {alignments.tsv} --output_prefix {output_prefix} --cores {threads}

where bed is a bed file of genomic regions to count accessibility, alignments.tsv is a tab-delimited file of ATAC-seq alignments, output_prefix is the prefix of output files, and threads is the number of threads to use. See epicount --help for more details.

alignments.tsv lists bam files of ATAC-seq alignments, one file per line, with the following columns:

path/a.bam
path/b.bam
path/c.bam

File names are used as sample names in the output files. Alternatively, you can use a tab-delimited file with the following columns to specify sample names:

path/a.bam	sample_a
path/b.bam	sample_b
path/c.bam	sample_c

EpiCount will generate three files: prefix.counts.parquet, prefix.raw_counts.parquet, prefix.bed. The parquet files containing the count matrix. The raw_counts file is not filtered for replication and counts file is filtered. bed file containing replicated the genomic regions across samples. The parquet file can be loaded with pandas and looking like:

df = pd.read_parquet('output_prefix.parquet')
df

EpiOut

To call outliers with EpiOut, run:

epiout --count_table {prefix.counts.parquet} --output_prefix {output_prefix} --cores {threads}

where count_table is the output of EpiCount, output_prefix is the prefix of output files, and threads is the number of threads to use. See epiout --help for more details. You can pass ordinary csv file of count matrix to --count_table argument where rows are genomic regions and columns are samples.

Output of EpiOut is prefix.h5ad file and prefix.results.csv. h5ad file contains statistics about outliers

from epiout import EpiOutResult

result = EpiOutResult.load('result.h5ad')

# outliers as dataframe
result.outlier

# log adjusted p-values as dataframe
result.log_padj

# results as dataframe alternatively read results.csv file
df_results = result.results()

# Visualize outliers or accessibile regions
result.qq_plot('chr1:100-200')
result.plot_counts('chr1:100-200')
result.plot_volcona('chr1:100-200')

See the documentation of EpiOutResult for more details.

EpiOut performs hyperparameter optimization to tune optimal bottleneck size of autoencoder. To specifiy the bottleneck size, use --bottleneck_size arguments.

EpiAnnot

epiannot_create --tissue {tissue or cell line name} --output_prefix {output_prefix}

where tissue is the name of tissue or cell line avaliable on ENCODE to fetch, output_prefix is the prefix of output files where config.yaml will be created and contains metadata and related files will be downloaded. See epiannot_create --help for more details:

Also you can check avaliable tissues or cell lines:

epiannot_list

To annotate accesible regions and chromatin interactions with EpiAnnot, run:

epiannot --bed {bed} --gtf {gtf} --counts {prefix.h5ad} --chrom_sizes {chrom_sizes} --output_prefix {output_prefix}

where bed is a bed file of genomic regions to annotate, gtf is a gtf file of gene annotations, counts is the output of EpiOut in h5ad file format or counts obtained with EpiCount, chrom_sizes is a file of chromosome sizes can be generated with pyfaidx from fasta file, and output_prefix is the prefix of output files. See epiannot --help for more details.

Output contains prefix.annotation.csv annotation of genomic regions based on histone marks provided in the config file, prefix.gtf.csv annotation on regions based on the proximity to genes, prefix.interaction.csv annotation of chromatin interactions between regions and prefix.genes.csv indicating that poteintial effected by the aberrant chromatin accessibility.

You can create annotation with your custom config file:

config.yaml

H3K27ac:
- ENCFF817IVB.bed.gz
- ENCFF916FML.bed.gz

H3K4me1:
- ENCFF456GWH.bed.gz

H3K4me3:
- ENCFF867WVM.bed.gz

your_custom_mark:
- a.bed

hic:
- ENCFF311CLH.hic
- ENCFF787ZVA.hic

They keys in the config file are the names of histone marks and the values are the list of bed files of histone marks. The config file can also contain a list of hic files to annotate chromatin interactions. Hic data is optional. The config file can be created with epiannot_create command or you can use your config file. To call promoter, active and poised enhancers please make sure that you name your histone marks as H3K4me3, H3K27ac, and H3K4me1 respectively. An other histone mark or bed files can be used annotate regions. Output prefix.annotation.csv will have a column for each key in the config file and will incidate if accesible region overlaps with the annotation source.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

epiout-0.0.1.tar.gz (37.0 kB view details)

Uploaded Source

Built Distribution

epiout-0.0.1-py3-none-any.whl (33.6 kB view details)

Uploaded Python 3

File details

Details for the file epiout-0.0.1.tar.gz.

File metadata

  • Download URL: epiout-0.0.1.tar.gz
  • Upload date:
  • Size: 37.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for epiout-0.0.1.tar.gz
Algorithm Hash digest
SHA256 203c77982deb679453981a0f57ec978e6e3d90771f02760d54313c242e54dc56
MD5 044808ddee1d009764c7ee2ab1be6406
BLAKE2b-256 34519f58cd7cee3b5553d7951158f10f0f762b91c45fecda36b51d64cf957d86

See more details on using hashes here.

File details

Details for the file epiout-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: epiout-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 33.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for epiout-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 67057d0ac63c570e6de10b5fc98692354672534052b2625ebbf991cee585f99c
MD5 82eb90409ed7ab55ef12f9fe232aa375
BLAKE2b-256 b7175dec38e405a1352cbb5943203e649dc898698cee438a28a5e6b6e9687d1a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page