Skip to main content

PineBioML is an easy-to-use ML toolkit.

Project description

Overview

In today’s data-driven world, making informed decisions requires more than just raw data—it demands intelligent insights. PineBioML is designed to provide a comprehensive workflow that guides users through every step of data analysis, from preprocessing to visualization. Whether you are a data scientist, researcher, or biomedical data analyst, this software tool empowers you with state-of-the-art machine learning algorithms, advanced feature selection techniques, and dynamic data visualization tools to extract valuable insights effortlessly.

image


Website

To make PineBioML effortless to use, we’ve built an intuitive, seamless, and powerful web platform. Whether you’re a new researcher or a professional analyst seeking high efficiency, you can complete even the most complex analyses with just a few simple steps. Clear interface design, intelligent workflow guidance, and interactive visualizations help you uncover answers faster, spot trends sooner, and make decisions with confidence.

http://pinebioml.icmol.ncu.edu.tw/


Downloadable package

To give users even more flexibility, PineBioML is also available as a Python package that you can easily download and use within your own working environment. In today’s data-driven era, PineBioML empowers you not only to access data but to uncover deep, meaningful insights with ease. From preprocessing and feature selection to dynamic visualization, you can rely on advanced machine learning algorithms and an intuitive analysis workflow to extract truly valuable information effortlessly.

System requirements

Compatible with Python 3.9, 3.10, and 3.11.

Installation

PineBioML is available on PyPI. You can access it through:

pip install PineBioML

For those who do not know how to use python, you can follow our step by step Installation tutorials.

Examples

After installation, you can download examples from release.

https://github.com/ICMOL/PineBioML/releases/download/example/examples126.zip

Chosse one of the following examples, double click it in jupyter interface:

ID Name Description
1 example_BasicUsage multi class.ipynb Demonstrate the basic features of PineBioML on a multi-class classification task.
2 example_BasicUsage regression.ipynb Demonstrate the basic features of PineBioML on a regression classification task.
3 example_Proteomics.ipynb An example on proteomics data analysis.
4 example_PipeLine.ipynb Demonstrate how to use the pipeline to store the whole data processing flow.
5 example_Pine.ipynb Demonstrate how to use Pine ml to finding the best data processing flow in an efficient way.

Execute the scripts

Click the buttom and the script should start. image


Features

In addition to the convenient Python package download, we also provide comprehensive API documentation to help users quickly get started and fully leverage every feature of PineBioML. Whether you are a data scientist, researcher, or biomedical data analyst, our package and documentation allow you to effortlessly perform the entire workflow—from preprocessing and feature selection to dynamic visualization—and uncover the most valuable insights.

API

1. Missing value preprocess

ID Option Definition
1 Deletion Remove the features that are too empty.
2 Imputation with a constant value Impute missing values with a constant value, such as 0 or the feature mean.
3 Imputation using K-NN algorithm Impute missing values with the mean or median of the k nearest samples.
4 Imputation by iterative regression Impute missing values by random forest regressor interatively.

2. Data transformation

ID Option Definition
1 PCA Principal component transform.
2 Power transform To make data more Gaussian-like, you can use either Box-Cox transform or Yeo-Johnson transform.
3 Feature clustering Group similar features into a cluster.
4 Feature expansion Generating new features by add/product/ratio in random pair of existing features.

3. Feature selection

ID Option Definition
1 Volcano plot Selecting by group p-value and fold change.
2 Lasso regression Selecting by Linear models with L1 penalty.
3 Decision stump Selecting by 1-layer decision tree.
4 Random Forest Selecting by Gini impurity or permutation importance over a Random Forest.
5 AdaBoost Selecting by Gini impurity over a AdaBoost model.
6 Gradient boosting Selecting by Gini impurity over a gradient boosting, such as XGboost or LightGBM.
7 Linear SVM Selecting by support vector from support vector machine.

4. Model building

The following models are available in PineBioML’s hyperparameter tuner:

ID Model Definition
1 ElasticNet Tuner will search the power and the ratio between L1 and L2 regularization on a logistic regression.
2 SVM Tuner will search the proper svm kernel function from one of linear, polynomial, rbf and sigmoid, and also the power of regulization.
3 Decision Tree The tuner will search through several decision-tree splitting constraints. Including tree depth, leaf size and leaf pruning threshold.
4 Random Forest Beyond standard decision-tree hyperparameters, the tuner also explores the subsampling proportion for bootstrap sampling and the number of trees in the forest.
5 AdaBoost Tuner will tune the step size and the number of boosting steps.
6 XGBoost In addtion to the tree hyperparameters, the tuner will search on gradient boosting steps, step size, L1 & L2 penalty and subsampling ratio.
7 LightGBM Similar to XGBoost.
8 CatBoost Similar to XGBoost.

PineBioML’s tuner not only optimizes model hyperparameters, but also handles class imbalance (through sample weighting), provides a unified model API and prediction formatting, and manages the definition of the hyperparameter search space.

5. Report and visualization

ID Option Definition
1 data_overview Giving a glance to input data.
2 classification_summary Summarizing a classification task.

Data for Test

The test data is downloaded from LinkedOmicsKB

A proteogenomics data-driven knowledge base of human cancer, Yuxing Liao, Sara R. Savage, Yongchao Dou, Zhiao Shi, Xinpei Yi, Wen Jiang, Jonathan T. Lei, Bing Zhang, Cell Systems, 2023.


Questions

If you encounter any issues or have any questions while using this software or website, you are welcome to ask or discuss them in GitHub Issues. We will do our best to provide assistance and answers there.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pinebioml-1.3.0.6.tar.gz (50.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pinebioml-1.3.0.6-py3-none-any.whl (61.1 kB view details)

Uploaded Python 3

File details

Details for the file pinebioml-1.3.0.6.tar.gz.

File metadata

  • Download URL: pinebioml-1.3.0.6.tar.gz
  • Upload date:
  • Size: 50.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for pinebioml-1.3.0.6.tar.gz
Algorithm Hash digest
SHA256 fb11aacd69b8dfab7c03e83f86dae455752fabcb9337cc4a41ac87052f8ead8f
MD5 5b9be245a5e237334e224e70d2c4ca3d
BLAKE2b-256 d5095038f2ee268f8e5489e4f1c2d838fb97660b64f8ef9ef0c690d24b2e9558

See more details on using hashes here.

File details

Details for the file pinebioml-1.3.0.6-py3-none-any.whl.

File metadata

  • Download URL: pinebioml-1.3.0.6-py3-none-any.whl
  • Upload date:
  • Size: 61.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for pinebioml-1.3.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 883e709996e73c3f9fae0fd208e70adb11b2f7dd0970fc9ff7e6e82b80abcbdc
MD5 a17ee9a923ada87977687b701cb598f5
BLAKE2b-256 217791b867dfcdb68b0dda67c3e9c051b3ef9b5e34698e3fb5f737f3dca0b3aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page