Skip to main content

A method to improve TCGA pancancer classifiers performance

Project description

PanClassif: A machine learning classifier pipeline for TCGA pancancer classification

This is a complete machine learning pipeline package to work with TCGA cancer RNA-seq gene count data.

Github

Data prerequisition

Functions

featSelect(homepath, cancerpath, normalpath, k)

Params

  • homepath : (str) Path where you want to save all the generated files and folders.
  • cancerpath : (str)
    Path where all the cancer's cancer gene expression matrix are located.
  • normalpath : (str)
    Path where all the cancer's normal gene expression matrix are located.
  • k : (int) The number of top genes you want to choose per cancer. (default: k=5) you can not put k less than 5

dataProcess(homepath,names,cancerpath,smoothed_cancer,smoothed_normal,scale_mode)

Params

  • homepath : (str) Path where you want to save all the generated files and folders.
  • cancerpath : (str) Path where all the cancer's cancer gene expression matrix are located.
  • names : (list) List of the cancer names found from featSelect function.
  • smoothed_cancer : (str) Path where all the cancer's smoothed cancer gene expression matrix are located.
  • smoothed_normal : (str) Path where all the cancer's smoothed normal gene expression matrix are located.
  • scale_mode (int): Here (0 is for Standardization and 1 for normalization) for data scalling

upsampled(names, homepath)

binary_merge(names, homepath)

multi_merge(names, homepath)

Params

  • names : (list) List of the cancer names found from featSelect function.
  • homepath : (str)
    Path where you want to save all the generated files and folders.

classification(homepath, classifier, mode, save_model)

Params

  • homepath : (str) Path where you want to save all the generated files and folders
  • classifer : (sklearn's classification model) Provide the classification model's instance you want to use. For example: RandomForestClassifier(n_estimators=100).
  • Or, classifer : (str) If you want to use "Neural Network" then just type "NN". For example: classifier = "NN"
  • mode : (str) There is two mode 1) binary 2) multi. Use "binary" for binary classification & "multi" for multiclass classification. (default: mode = "binary")
  • save_model : (str) Optional parameter. Use it only if you want to save the model. For example: save_model = "your_model_name"

gsea(homepath)

  • homepath : (str) Path where you want to save all the generated files and folders

Example


homepath = '/home'
cancerpath = '/home/cancer/'
normalpath = '/home/normal/'

smoothed_cancer = '/home/smoothed_cancer'
smoothed_normal = '/home/smoothed_normal'

Data Load and Process Phase

import panclassif as pc 
#You have to follow below order to work the code properly 
names = pc.featSelect(homepath,cancerpath,normalpath, k=5)
pc.dataProcess(homepath,names,cancerpath,smoothed_cancer,smoothed_normal)
pc.upsampled(names, homepath)
pc.binary_merge(names, homepath)
pc.multi_merge(names, homepath)

Classification Phase

from sklearn.ensemble import RandomForestClassifier
pc.classification(homepath, RandomForestClassifier(n_estimators=100), mode="multi", save_model="RF")

Gene enrichment check

pc.gsea(homepath)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

panclassif-2.1.3-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file panclassif-2.1.3-py3-none-any.whl.

File metadata

  • Download URL: panclassif-2.1.3-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.26.0 setuptools/57.4.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.7

File hashes

Hashes for panclassif-2.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f9c417e851476895cfbb261644f24a09ad396bf914d28f9383721d653b55829c
MD5 85f1859ee942a464028d46d902658724
BLAKE2b-256 c07fa271948b52b7093b82ad99cf5770eed545e6fa4cbe5dbc574313ce610772

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page