System for turnkey analysis of semi-automated genome annotations

These details have not been verified by PyPI

Project links

Homepage

Project description

Segzoo

Introduction

Segzoo is a tool designed to automate various genomic analyses on segmentations obtained using Segway. It provides detailed results for each analysis and a comprehensive visualization summarizing the outcomes.

The tool has specific dependencies, including segtools, bedtools, and various Python packages. However, these dependencies will be automatically handled during the installation process.

Installation and Quick Start

We recommend installing Segzoo using the mamba package manager.

To install Segzoo in a separate environment, open a terminal and execute mamba create -c bioconda -n segzooenv segzoo -y.
Once the installation is complete, activate the Segzoo environment by running mamba activate segzooenv.
To test Segzoo, download the segmentation file and the GMTK parameters, and place them in a directory named, for example, segzoo.
After the files are in place, execute segzoo segway.bed.gz --parameters params.params.
After approximately 30 minutes, the resulting visualization will be stored in the outdir/plots folder within the current directory.

Usage

To access the help and learn how to run Segzoo, execute segzoo -h or segzoo --help. Here are the available command-line arguments:

--version: Check the currently installed version of Segzoo.
--parameters: Specify a params.params file generated from Segway's training to obtain GMTK parameters in the final visualization. If not specified, GMTK parameters will not be displayed.
--prefix: Specify the location where all necessary data, such as genome assembly, should be downloaded (default: the installation environment's directory).
-o or --outdir: Specify the folder where all the results and the final visualization will be created (default: outdir).
-j: Specify the number of cores to utilize (default: 1).
--species and --build: Specify the species and build for which the segmentation was created (default: Homo_sapiens and hg38).
--download-only: This option is designed to support cluster use. Running Segzoo with this argument will only execute the downloading rules of the pipeline and store the data using the specified prefix. Subsequently, runs on nodes without internet access can be performed by specifying the same prefix.
--mne: Specify an mne file to translate segment labels and track names shown on the figure. Refer to the 'Using mne files' section for details.
--normalize-gmtk: Allow row-wise normalization of GMTK parameters table.
--dendrogram: Perform hierarchical clustering of GMTK parameters row-wise.

If you are interested in obtaining information on gene biotypes other than protein coding and lincRNA, which are the default, modify the gene_biotypes.py file in the installation folder of Segzoo accordingly. Similarly, the final visualization can be customized by modifying specific variables in visualization.py.

Once the segzoo command is executed, specifying the segmentation file and any desired optional arguments, the pipeline will commence. It will download all necessary data, run various analyses, and generate the final visualization. Please note that this execution may take some time.

Results

Upon completion of the execution, a new directory will be created (default name: outdir). The following folders will be available:

data: Contains the results for all the tools' analyses.
results: Contains the processed result tables used in the visualization.
plots: Contains the final visualization, which will resemble the example below:

Plot

In the visualization:

The y-axis represents the labels of the segmentation for all the heatmaps.
The x-axis displays the different results obtained for each of them.
The left section showcases the learned parameters during the Segway training.
Subsequently, a heatmap is displayed, with each column normalized to the color map's limits.
The aggregation tables are presented in the specified order from gene_biotypes.py, potentially containing duplicates.
The aggregation results for each label represent the percentage of counts in one component compared to all the idealized genes. Each row's values sum up to 100.
The number of genes found for each biotype is provided after the biotype's name.

Using MNE Files

The mne file can be utilized to translate segment labels and track names in the final figure. The file is tab-delimited and should contain three columns in any order:

old: The original label or track name displayed when running segzoo with default parameters. Values in this column serve as keys in a Python dictionary or lookup table.
new: Replace the old value with the corresponding new value from this column.
type: Indicate whether the row should be used to translate a track or a label. This is especially useful when tracks and labels have the same old name.

The file header is mandatory and should include the three fields: old, new, and type.

Please note that only the tracks and labels defined in the mne file will be updated. Unused tracks and labels will remain unchanged. Here is an example of an mne file:

old    new    type
0      Quiescent    label
1      TSS    label
H3K4me3_robust_peaks    H3K4me3    track

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.13

Apr 17, 2024

1.0.12

Feb 26, 2024

1.0.11

Feb 3, 2023

1.0.10 yanked

Feb 2, 2023

Reason this release was yanked:

extra file tmp.py contains non python code and can cause errors when installing.

1.0.9

Sep 19, 2022

1.0.7

Apr 28, 2022

1.0.4

Jun 20, 2018

1.0.3

Jun 15, 2018

1.0.2

May 25, 2018

1.0.1

May 24, 2018

1.0.0

May 23, 2018

1.0.0.dev11 pre-release

May 11, 2018

1.0.0.dev10 pre-release

May 10, 2018

1.0.0.dev8 pre-release

Apr 16, 2018

1.0.0.dev7 pre-release

Apr 12, 2018

1.0.0.dev6 pre-release

Apr 11, 2018

1.0.0.dev5 pre-release

Apr 11, 2018

1.0.0.dev4 pre-release

Apr 11, 2018

1.0.0.dev3 pre-release

Apr 10, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

segzoo-1.0.13.tar.gz (28.1 kB view details)

Uploaded Apr 17, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

segzoo-1.0.13-py3-none-any.whl (29.6 kB view details)

Uploaded Apr 17, 2024 Python 3

File details

Details for the file segzoo-1.0.13.tar.gz.

File metadata

Download URL: segzoo-1.0.13.tar.gz
Upload date: Apr 17, 2024
Size: 28.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.8.18

File hashes

Hashes for segzoo-1.0.13.tar.gz
Algorithm	Hash digest
SHA256	`03903414891982e4d296f253119cf404a525844abdc84b02285923866926a708`
MD5	`5eeed695986b5de2485a74c773f10992`
BLAKE2b-256	`644d3bd8a8aae41c84095fc0796668d3896b64ecc9e62f06e3d294760fc372ea`

See more details on using hashes here.

File details

Details for the file segzoo-1.0.13-py3-none-any.whl.

File metadata

Download URL: segzoo-1.0.13-py3-none-any.whl
Upload date: Apr 17, 2024
Size: 29.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.8.18

File hashes

Hashes for segzoo-1.0.13-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9b47a0e05aabb38523dd77617ba55ca67022e788cdb127aa8d42a382160b5232`
MD5	`6992a839b4e3314cb7a261f06a6b7955`
BLAKE2b-256	`56da8ee14afd2cfc2a8cfa16fea298d847a78057038f2a7540e0dacefe3bb838`

See more details on using hashes here.

segzoo 1.0.13

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Segzoo

Introduction

Installation and Quick Start

Usage

Results

Using MNE Files

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes