Skip to main content

PineBioML is a easy use ML toolkit.

Project description

Overview

In today’s data-driven world, making informed decisions requires more than just raw data—it demands intelligent insights. PineBioML is designed to provide a comprehensive workflow that guides users through every step of data analysis, from preprocessing to visualization. Whether you are a data scientist, researcher, or biomedical data analyst, this software tool empowers you with state-of-the-art machine learning algorithms, advanced feature selection techniques, and dynamic data visualization tools to extract valuable insights effortlessly.

image

System requirements

Python 3.9, 3.10, 3.11

Installation

PineBioML is available on PyPI. You can access it through:

pip install PineBioML

For those who do not know how to use python, you can follow our step by step Installation tutorials.

Examples

After installation, you can download examples from release.

https://github.com/ICMOL/PineBioML/releases/download/example/examples126.zip

Chosse one of the following examples, double click it in jupyter interface:

ID Name Description
1 example_BasicUsage multi class.ipynb Demonstrate the basic features of PineBioML on a multi-class classification task.
2 example_BasicUsage regression.ipynb Demonstrate the basic features of PineBioML on a regression classification task.
3 example_Proteomics.ipynb An example on proteomics data analysis
4 example_PipeLine.ipynb Demonstrate how to use the pipeline to store the whole data processing flow
5 example_Pine.ipynb Demonstrate how to use Pine ml to finding the best data processing flow in an efficient way

Execute the scripts

Click the buttom and the script should start. image

Features

0. Document

API

1. Missing value preprocess

ID Option Definition
1 Deletion Remove the features that are too empty.
2 Imputation with a constant value Impute missing values with a constant value, such as 0 or the feature mean.
3 Imputation using K-NN algorithm Impute missing values with the mean or median of the k nearest samples.

2. Data transformation

ID Option Definition
1 PCA Principal component transform.
2 Power transform To make data more Gaussian-like, you can use either Box-Cox transform or Yeo-Johnson transform.
3 Feature clustering Group similar features into a cluster.
4 Feature expansion Generating new features by add/product/ratio in random pair of existing features.

3. Feature selection

ID Option Definition
1 Volcano plot Selecting by group p-value and fold change
2 Lasso regression Selecting by Linear models with L1 penalty
3 Decision stump Selecting by 1-layer decision tree
4 Random Forest Selecting by Gini impurity or permutation importance over a Random Forest
5 AdaBoost Selecting by Gini impurity over a AdaBoost model
6 Gradient boosting Selecting by Gini impurity over a gradient boosting, such as XGboost or LightGBM
7 Linear SVM Selecting by support vector from support vector machine

4. Model building

ID Option Definition
1 ElasticNet Using Optuna to find a not-bad hyper parameters on given dataset.
2 SVM Using Optuna to find a not-bad hyper parameters on given dataset.
3 Decision Tree Using Optuna to find a not-bad hyper parameters on given dataset.
4 Random Forest Using Optuna to find a not-bad hyper parameters on given dataset.
5 AdaBoost Using Optuna to find a not-bad hyper parameters on given dataset.
6 XGBoost Using Optuna to find a not-bad hyper parameters on given dataset.
7 LightGBM Using Optuna to find a not-bad hyper parameters on given dataset.
8 CatBoost Using Optuna to find a not-bad hyper parameters on given dataset.

5. Report and visualization

ID Option Definition
1 data_overview Giving a glance to input data.
2 classification_summary Summarizing a classification task

Contact us

112826006@cc.ncu.edu.tw

Cites

The example data is from LinkedOmicsKB

A proteogenomics data-driven knowledge base of human cancer, Yuxing Liao, Sara R. Savage, Yongchao Dou, Zhiao Shi, Xinpei Yi, Wen Jiang, Jonathan T. Lei, Bing Zhang, Cell Systems, 2023.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pinebioml-1.2.6.1.tar.gz (53.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pinebioml-1.2.6.1-py3-none-any.whl (60.8 kB view details)

Uploaded Python 3

File details

Details for the file pinebioml-1.2.6.1.tar.gz.

File metadata

  • Download URL: pinebioml-1.2.6.1.tar.gz
  • Upload date:
  • Size: 53.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for pinebioml-1.2.6.1.tar.gz
Algorithm Hash digest
SHA256 3fa18a5fdf2b9daa26bff740722823087f4ef67ae7e1a2137c092b9e785cd06d
MD5 b3d97948350bcdefaad372d29a275f9e
BLAKE2b-256 b575c65db664499ee8d1ca78e255eb99a0f643e7a1277b7bba18b0f698e2d62b

See more details on using hashes here.

File details

Details for the file pinebioml-1.2.6.1-py3-none-any.whl.

File metadata

  • Download URL: pinebioml-1.2.6.1-py3-none-any.whl
  • Upload date:
  • Size: 60.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for pinebioml-1.2.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d88dfa925eb7c639dbbeb0ebea7660b59cadeac1d34bf2a95326bf7870769a2e
MD5 dec41f31a0c91a1a54ff1d9d837619c2
BLAKE2b-256 b6e1ec659938f6c244d2c6d4dfad8078844febd985feb5fc6660bb9124efaa76

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page