Skip to main content

PhenoGO: A tool to build WEKA-ready ARFF files from model organism phenotype and Gene Ontology (GO) annotations.

Project description

DOI

PhenGO

Overview

This project provides a unified Python-based tool to generate ready-to-use WEKA ARFF formatted files, specifically designed for machine learning applications involving gene essentiality prediction. The tool integrates phenotype data and Gene Ontology (GO) annotations for genes from selected model organisms, streamlining the data preparation process.

Purpose

The main goal of this project is to simplify and standardise the creation of ARFF files that combine phenotype information with GO-mapped gene data. This enables researchers to efficiently apply machine learning techniques (using WEKA or similar platforms) to analyse gene essentiality and related biological questions across various model organisms.

Features

  • Unified Workflow: Handles data collection, integration, and formatting in a single pipeline.
  • Model Organism Support: Designed for commonly studied organisms (e.g., Saccharomyces cerevisiae, Mus musculus).
  • GO Annotation Integration: Maps genes to their respective GO terms for comprehensive feature representation and traces obo files to acquire parent terms.
  • Phenotype Data Inclusion: Incorporates phenotype labels for supervised learning tasks.
  • WEKA ARFF Output: Produces files in the ARFF format, ready for immediate use in WEKA.

Installation

To install the PhenGO package, you can use pip:

pip install phengo

Usage

PhenGO Package:

PhenGO Example:

PhenGO -species fly -phenotype_file data/fly/phenotype_data/2017/allele_phenotypic_data_fb_2017_05.tsv.gz -gene_association_file data/fly/gene_association/2017/gene_association_2017_05.fb.gz
-go_obo_file data/go/2017/go_2017-05-01.obo.gz -output_dir Documents/PhenGO/fly_2017

The output will be saved in the specified output directory, which will contain the ARFF file and other relevant data files.

Menu:

usage: PhenGO.py [-h] -species SPECIES -phenotype_file PHENOTYPE_FILE
                 -gene_association_file GENE_ASSOCIATION_FILE -go_obo_file
                 GO_OBO_FILE -output_dir OUTPUT_DIR [-filter_unused_gos]
                 [-filter_mixed_terms] [-gene_go_pheno]
                 [-fly_assignments FLY_ASSIGNMENTS]
                 [-driver_lines DRIVER_LINES] [-filt_with]
                 [-worm_phenotypes WORM_PHENOTYPES]
                 [-mouse_phenotypes MOUSE_PHENOTYPES] [-v]

PhenGO v0.1.2 - Convert phenotype and GO data to ARFF format

Required Options:
  -species SPECIES      Species tag (e.g., fly, yeast)
  -phenotype_file PHENOTYPE_FILE
                        Path to the phenotype data file (.gz)
  -gene_association_file GENE_ASSOCIATION_FILE
                        Path to the gene association file (.gz)
  -go_obo_file GO_OBO_FILE
                        Path to the go.obo file
  -output_dir OUTPUT_DIR
                        Output directory

Optional parameters:
  -filter_unused_gos    Filter out unused GO terms from the FUNC and ARFF
                        output (default: True)
  -filter_mixed_terms   Filter out genes which have both lethal and viable
                        phenotypes - Terms not specifically lethal/viable are
                        not counted in this (default: False)
  -gene_go_pheno        Output "Gene-GO-Phenotype" (Rbbp5 GO:0003674 0) file
                        for overrepresentation analysis with tools such as
                        FUNC (default: False)

Fly specific parameters:
  -fly_assignments FLY_ASSIGNMENTS
                        Provide TSV file of fly assignments (file confirming
                        genes are assignment to drosophila melanogaster
                        (default: "data/fly/FlyBase_Fields_2017.txt.gz")
  -driver_lines DRIVER_LINES
                        Provide TSV file of fly driver lines (file containing
                        the name of driver lines (RNAi) to ignore when present
                        with the "with" tag (default: "data/fly/FlyBase_Driver
                        Line_Fields_2025_08_05.txt.gz")
  -filt_with            Filter out phenotype with "with" tag (default: DO NOT
                        FILTER)

Worm specific parameters:
  -worm_phenotypes WORM_PHENOTYPES
                        Provide TSV file of worm phenotypes (default:
                        "data/worm/WS297_lethal_terms.tsv.gz")

Mouse specific parameters:
  -mouse_phenotypes MOUSE_PHENOTYPES
                        Provide TSV file of mouse phenotypes (default:
                        "data/mouse/mouse_lethal_terms.txt.gz")

Misc:
  -v, --version         show program's version number and exit

Compare-ARFF:

usage: compare_arff_genes.py [-h] -arff_a ARFF_A -arff_b ARFF_B -o OUTPUT

PhenoGO v0.1.2 - Compare-ARFF: Compare two ARFF files.

options:
  -h, --help      show this help message and exit
  -arff_a ARFF_A  Master ARFF file (reference)
  -arff_b ARFF_B  Comparison ARFF file
  -o OUTPUT       Output CSV file

Output: The output of the compare-arff function is a CSV file that summarizes the comparison between two ARFF files.

Gene,Label A,Label B,GO Terms Differ,Status
GeneA,lethal,,,"MISSING_IN_B"
GeneB,lethal,viable,,"LABEL_MISMATCH"
GeneC,viable,viable,GO:0008150;GO:0003674,"GO_TERM_MISMATCH"
GeneD,viable,viable,,"EXACT_MATCH"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phengo-0.1.2.tar.gz (27.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

phengo-0.1.2-py3-none-any.whl (28.4 kB view details)

Uploaded Python 3

File details

Details for the file phengo-0.1.2.tar.gz.

File metadata

  • Download URL: phengo-0.1.2.tar.gz
  • Upload date:
  • Size: 27.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for phengo-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b42ca678b84bd86129e79262572aee268729d71106a3dc2ac661e40fa230aced
MD5 ca2f1f4d9d4a7759851eb8129b38d2bb
BLAKE2b-256 9f68d34a934dbda06eff5fd42b2a8d4f6dc6e8e4ea3613f826d54e1999c289c1

See more details on using hashes here.

File details

Details for the file phengo-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: phengo-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 28.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for phengo-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 11a51280b227e40bb222e56e5a7b2df8700062ddcd8f273e2b161351882ec42b
MD5 f7a353da865a8a4a3f5ac05a8f8b5080
BLAKE2b-256 41c825d3e20e013e3650233b8c62272c71a3ee5dedca9bb186705890ab8879e4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page