A Python package for identifying essential genes in cancer.

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

amirhossein_haerian

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

GenePioneer: A Comprehensive Python Package for Identification of Essential Genes and Modules in Cancer

Description

The GenePioneer was developed as a fast and straightforward way to integrate gene ranking and module detection into a practical, Python-based tool for cancer researchers. It requires minimal input, delivers clear output, and can be run within a Python environment, making it highly user-friendly and accessible to non expert programmers while supporting large-scale dataset analysis. By evaluating gene importance and identifying gene interactions within cancer networks, GenePioneer provides critical insights into the genetic drivers of cancer. Key features include ranking genes by their network significance and identifying the modules they belong to, which helps explore cancer-related pathways and aids in developing precise therapies. GenePioneer’s user-centric design ensures that researchers of all skill levels can make use of its capabilities. By combining comprehensive data integration, advanced networkbased analysis, and statistical rigor, GenePioneer stands as a versatile and impactful resource for cancer research across multiple cancer types.

Features

Gene Ranking: Determines gene importance based on network significance.
Module Detection: Identifies gene clusters within cancer pathways.
Statistical Analysis: Evaluates detected gene modules and their association with known pathways.
User-Friendly API: Allows researchers of all skill levels to analyze cancer-related genetic data efficiently.
Full Reproducibility: Option to either use precomputed data or regenerate all components from raw datasets.

Installation

GenePioneer is available via PyPI. Install it using:

pip install genepioneer

Two Usage Modes

You can either:

Use Preprocessed Data: Run analysis using prebuilt datasets in Data/cancer-gene-data/ and Data/module-data/.
Reproduce Everything: Build networks, generate rankings, and detect modules from raw cancer data stored in GenesData/.

Option 1: Using Preprocessed Data

This is the simplest approach. You only need to provide a list of genes and specify the cancer type.

Step 1: Prepare a Gene List

Create a .txt file containing one gene name per line in the OFFICIAL_GENE_SYMBOL format.

Example (`gene_list.txt`):

BRCA1
TP53
PTEN

Step 2: Run Gene Analysis

from genepioneer import GeneAnalysis

gene_analysis = GeneAnalysis("Ovary", "./Data/benchmark-data/gene_list.txt")
gene_analysis.analyze_genes()

Step 3: Output

This will generate an output.json file with:

Gene Rankings: Sorted based on importance in the network.
Modules: Groups of genes functionally related in cancer.
Statistical Significance: Evaluation of identified modules.

Supported Cancer Types

"Adrenal", "Bladder", "Brain", "Cervix", "Colon", "Corpus uteri", "Kidney", "Liver", "Ovary", "Prostate", "Skin", "Thyroid"

Option 2: Reproducing Everything (Building Data from Scratch)

If you want full control over data generation, follow these steps to build your own cancer-specific datasets.

Step 1: Add Required Data

You need:

Raw TCGA Cancer Data (GenesData/): Cancer-specific gene expression data.
IBM Gene Ontology (GenesData/IBP_GO_Terms.xlsx): Gene-to-biological process mappings.

Example Directory Structure

GenesData/
│-- IBP_GO_Terms.xlsx
│-- Adrenal/
│   │-- ABL1/
│   │   │-- ABL1.tsv

Step 2: Build Network and Compute Features

from genepioneer import NetworkBuilder

network_builder = NetworkBuilder("Adrenal", "./GenesData")
graph = network_builder.build_network()
features = network_builder.calculate_all_features()
network_builder.save_features_to_csv(features, "./Data/cancer-gene-data/Adrenal")

This step:

Builds a gene interaction network.
Computes network-based features (e.g., centrality, entropy, Laplacian scores).
Saves features to a CSV file.

Step 3: Detect Modules

from genepioneer import NetworkAnalysis

network_analysis = NetworkAnalysis("Adrenal", features)
modules = network_analysis.module_detection()

Identifies gene modules based on connectivity and functional relevance.
Saves results as Data/module-data/Adrenal.json.

Step 4: Run Full Gene Analysis

Once networks and modules are generated, you can proceed with standard gene analysis:

from genepioneer import GeneAnalysis

gene_analysis = GeneAnalysis("Adrenal", "./Data/benchmark-data/gene_list.txt", 
                             cancer_gene_path="./Data/cancer-gene-data", 
                             module_data_path="./Data/module-data")
gene_analysis.analyze_genes()

Dataset Structure and Format

**1. TCGA Data (`GenesData//.tsv`)**

Contains gene expression and associated cases.

Example (`GenesData/Adrenal/ABL1/ABL1.tsv`):

Case ID	Expression
TCGA-01	2.5
TCGA-02	1.8

2. IBM Gene Ontology (`GenesData/IBP_GO_Terms.xlsx`)

Links genes to biological processes.

Process	Gene1	Gene2
Cell Cycle	BRCA1	TP53

**3. Network Features (`Data/cancer-gene-data/*.csv`)**

Stores computed network importance scores.

Example (`Data/cancer-gene-data/Adrenal_network_features.csv`):

node,ls_score
ABL1,0.85
TP53,0.92

**4. Module Data (`Data/module-data/*.json`)**

Contains detected gene modules.

Example (`Data/module-data/Adrenal.json`):

{
  "module_1": [
    ["ABL1", "TP53"],
    3.5,
    1.2
  ]
}

Reproducibility Steps

Clone the Repository

git clone https://github.com/yourusername/GenePioneer.git
cd GenePioneer

Install Dependencies

pip install -r requirements.txt

Add or Generate Data

Use prebuilt data (Option 1), or
Generate data from raw sources (Option 2).

Run Gene Analysis

python -m genepioneer.gene_analysis "Adrenal" "./Data/benchmark-data/gene_list.txt"

Verify Output

output.json contains ranked genes and detected modules.

Questions about the implementation:

Amirhossein Haerianardakani, haerian.amirhossein[at]gmail.com

If you encounter a bug, experience a failed function, or have a feature request, please open an issue in the GitHub or contact Amirhossein.

License

This project is licensed under the MIT License - MIT License

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

amirhossein_haerian

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

This version

1.1.0

Mar 9, 2025

1.0.3

Dec 8, 2024

1.0.2

Sep 18, 2024

1.0.1

Sep 18, 2024

1.0.0

Sep 18, 2024

0.1.3

Aug 18, 2024

0.1.2

Aug 18, 2024

0.1.1

Aug 18, 2024

0.1.0

Aug 18, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genepioneer-1.1.0.tar.gz (4.9 MB view details)

Uploaded Mar 9, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

genepioneer-1.1.0-py3-none-any.whl (4.9 MB view details)

Uploaded Mar 9, 2025 Python 3

File details

Details for the file genepioneer-1.1.0.tar.gz.

File metadata

Download URL: genepioneer-1.1.0.tar.gz
Upload date: Mar 9, 2025
Size: 4.9 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for genepioneer-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`509b99fb75cc4ef0c94352e508e94a6e1e45d32443112185eb01d8e2398755c5`
MD5	`ffb6c1860c34a118e6ccf3e583497e5b`
BLAKE2b-256	`9275335c665e616eebcb1129f7a455cb3984b8d4021110be3e0c0586b1a100db`

See more details on using hashes here.

Provenance

The following attestation bundles were made for genepioneer-1.1.0.tar.gz:

Publisher: publish.yml on amirhossein-haerian/GenePioneer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: genepioneer-1.1.0.tar.gz
- Subject digest: 509b99fb75cc4ef0c94352e508e94a6e1e45d32443112185eb01d8e2398755c5
- Sigstore transparency entry: 179423483
- Sigstore integration time: Mar 9, 2025
Source repository:
- Permalink: amirhossein-haerian/GenePioneer@abcbdaef3342496f101f1ecaf7070532bc603f59
- Branch / Tag: refs/tags/1.1.0
- Owner: https://github.com/amirhossein-haerian
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@abcbdaef3342496f101f1ecaf7070532bc603f59
- Trigger Event: release

File details

Details for the file genepioneer-1.1.0-py3-none-any.whl.

File metadata

Download URL: genepioneer-1.1.0-py3-none-any.whl
Upload date: Mar 9, 2025
Size: 4.9 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for genepioneer-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`57471b2673054e8a206dc4f7de997853666492118975716b4ced8fda9ec58301`
MD5	`7f9321d7b5358ca5f46311966fd22478`
BLAKE2b-256	`90ab73c5ae382b5ebe5e149b87a1a968327e4fed695a6b5d94d56b022efd2448`

See more details on using hashes here.

Provenance

The following attestation bundles were made for genepioneer-1.1.0-py3-none-any.whl:

Publisher: publish.yml on amirhossein-haerian/GenePioneer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: genepioneer-1.1.0-py3-none-any.whl
- Subject digest: 57471b2673054e8a206dc4f7de997853666492118975716b4ced8fda9ec58301
- Sigstore transparency entry: 179423484
- Sigstore integration time: Mar 9, 2025
Source repository:
- Permalink: amirhossein-haerian/GenePioneer@abcbdaef3342496f101f1ecaf7070532bc603f59
- Branch / Tag: refs/tags/1.1.0
- Owner: https://github.com/amirhossein-haerian
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@abcbdaef3342496f101f1ecaf7070532bc603f59
- Trigger Event: release

genepioneer 1.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

GenePioneer: A Comprehensive Python Package for Identification of Essential Genes and Modules in Cancer

Description

Features

Installation

Two Usage Modes

Option 1: Using Preprocessed Data

Step 1: Prepare a Gene List

Example (gene_list.txt):

Step 2: Run Gene Analysis

Step 3: Output

Supported Cancer Types

Option 2: Reproducing Everything (Building Data from Scratch)

Step 1: Add Required Data

Example Directory Structure

Step 2: Build Network and Compute Features

Step 3: Detect Modules

Step 4: Run Full Gene Analysis

Dataset Structure and Format

1. TCGA Data (GenesData/*/*.tsv)

Example (GenesData/Adrenal/ABL1/ABL1.tsv):

2. IBM Gene Ontology (GenesData/IBP_GO_Terms.xlsx)

3. Network Features (Data/cancer-gene-data/*.csv)

Example (Data/cancer-gene-data/Adrenal_network_features.csv):

4. Module Data (Data/module-data/*.json)

Example (Data/module-data/Adrenal.json):

Reproducibility Steps

Questions about the implementation:

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Example (`gene_list.txt`):

**1. TCGA Data (`GenesData//.tsv`)**

Example (`GenesData/Adrenal/ABL1/ABL1.tsv`):

2. IBM Gene Ontology (`GenesData/IBP_GO_Terms.xlsx`)

**3. Network Features (`Data/cancer-gene-data/*.csv`)**

Example (`Data/cancer-gene-data/Adrenal_network_features.csv`):

**4. Module Data (`Data/module-data/*.json`)**

Example (`Data/module-data/Adrenal.json`):