PineBioML is a easy use ML toolkit.
Project description
Overview
This package aims to help analysising biomedical data using ML method in python.
System requirements
Python 3.10+
Installation
PineBioML is available on PyPI. You can access it through:
pip install PineBioML
For those who do not know how to use python, you can follow our step by step Installation tutorials.
- Windows11/10
- MacOs
Examples
After installation, you can download examples from release.
https://github.com/user-attachments/files/17568138/examples.zip
Chosse one of the following examples, double click it in jupyter interface:
| ID | Name | Description |
|---|---|---|
| 1 | example_BasicUsage multi class.ipynb | Demonstrate the basic features of PineBioML on a multi-class classification task. |
| 2 | example_BasicUsage regression.ipynb | Demonstrate the basic features of PineBioML on a regression classification task. |
| 3 | example_Proteomics.ipynb | An example on proteomics data analysis |
| 4 | example_PipeLine.ipynb | Demonstrate how to use the pipeline to store the whole data processing flow |
| 5 | example_Pine.ipynb | Demonstrate how to use Pine ml to finding the best data processing flow in an efficient way |
Execute the scripts
Click the buttom and the script should start.
Features
0. Document
1. Missing value preprocess
| ID | Option | Definition |
|---|---|---|
| 1 | Deletion | Remove the features that are too empty. |
| 2 | Imputation with a constant value | Impute missing values with a constant value, such as 0 or the feature mean. |
| 3 | Imputation using K-NN algorithm | Impute missing values with the mean or median of the k nearest samples. |
2. Data transformation
| ID | Option | Definition |
|---|---|---|
| 1 | PCA | Principal component transform. |
| 2 | Power transform | To make data more Gaussian-like, you can use either Box-Cox transform or Yeo-Johnson transform. |
| 3 | Feature clustering | Group similar features into a cluster. |
| 4 | Feature expansion | Generating new features by add/product/ratio in random pair of existing features. |
3. Feature selection
| ID | Option | Definition |
|---|---|---|
| 1 | Volcano plot | Selecting by group p-value and fold change |
| 2 | Lasso regression | Selecting by Linear models with L1 penalty |
| 3 | Decision stump | Selecting by 1-layer decision tree |
| 4 | Random Forest | Selecting by Gini impurity or permutation importance over a Random Forest |
| 5 | AdaBoost | Selecting by Gini impurity over a AdaBoost model |
| 6 | Gradient boosting | Selecting by Gini impurity over a gradient boosting, such as XGboost or LightGBM |
| 7 | Linear SVM | Selecting by support vector from support vector machine |
4. Model building
| ID | Option | Definition |
|---|---|---|
| 1 | ElasticNet | Using Optuna to find a not-bad hyper parameters on given dataset. |
| 2 | SVM | Using Optuna to find a not-bad hyper parameters on given dataset. |
| 3 | Decision Tree | Using Optuna to find a not-bad hyper parameters on given dataset. |
| 4 | Random Forest | Using Optuna to find a not-bad hyper parameters on given dataset. |
| 5 | AdaBoost | Using Optuna to find a not-bad hyper parameters on given dataset. |
| 6 | XGBoost | Using Optuna to find a not-bad hyper parameters on given dataset. |
| 7 | LightGBM | Using Optuna to find a not-bad hyper parameters on given dataset. |
| 8 | CatBoost | Using Optuna to find a not-bad hyper parameters on given dataset. |
5. Report and visualization
| ID | Option | Definition |
|---|---|---|
| 1 | data_overview | Giving a glance to input data. |
| 2 | classification_summary | Summarizing a classification task |
Contact us
Cites
The example data is from LinkedOmicsKB
A proteogenomics data-driven knowledge base of human cancer, Yuxing Liao, Sara R. Savage, Yongchao Dou, Zhiao Shi, Xinpei Yi, Wen Jiang, Jonathan T. Lei, Bing Zhang, Cell Systems, 2023.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pinebioml-1.2.4.tar.gz.
File metadata
- Download URL: pinebioml-1.2.4.tar.gz
- Upload date:
- Size: 50.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8678bbe4ba5bc0777880708d8d7f8efd37a35c4b4598a5b48342804a9b45ef56
|
|
| MD5 |
959225a5f76ef282c905cc219e5b48e6
|
|
| BLAKE2b-256 |
b50c9b49f41b853f0ae2aa6541222d133fe09056c0f1c02d067b45c21104d36a
|
File details
Details for the file pinebioml-1.2.4-py3-none-any.whl.
File metadata
- Download URL: pinebioml-1.2.4-py3-none-any.whl
- Upload date:
- Size: 58.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35538e8244bffa16f0a999ddee1365b6c5966b0c604899dd38aeff638e35f0c6
|
|
| MD5 |
13077b8f6bcc4b9130cfb2003fbd2049
|
|
| BLAKE2b-256 |
d078dfd918d76bab391a567223c71c877f8e7d87ebb2d9a116da3863e8416647
|