Calvin's Data Science Toolbox
Project description
CDST (Calvin's Data Science Toolbox)
CDST is a collection of data science Python library developed by Calvin Chan at DSAA, Bayer Pharmaceutical. It contains various data science toolsets mostly based on deep learning technique:
- General Scalable Deep Learning Fully Connected Network (DNN)
- Calvin's Scalable Parallel Downsampler (CSPD)
- Ordinal Hyperplane Loss Classifier (OHPL)
The above algorithms are written to deal with positive output data, updates will be made in the future to accomodate real number upon requests.
This package allows users to sample the network architecture based on sampling parameter, the architecture sampling function is included in this package. The architecture sampling parameter is used as hyperparameter and the user can sample the network architecture based on: (1) a given number of neutrons or (2) a given number of model parameters. In the case of using a given number of model parameters, the sample is computed based on Mixed-Integer Nonlinear Programming Model using the GEKKO package. The accuracy/error of the given set of hyperparameter is estimated using k-fold cross validation, the accuracy/error of each of the k-fold is returned for statistical analysis.
All deep learning modules in this package are designed based on the Ray Tune hyperparameter tuning package, user can sample the multi-layer network neuron distribution using the provided architecture sampling function, together with the range of other hyperparameters including: learning rate, batch size, dropout probability.
Design examples are shown in the "example" folder with detail structure and graphical illustration of each module. Users can follow these examples and adjust accordingly to suit their own use case and to better understand the mechanics behind the package.
Hyperparameter Tunning
DNN
- Use custom sampling function to describe the hierachical neuron distribution between:
-
total neuron:
-
neuron per layer:
-
CSPD
- Use custom sampling function to describe the hierachical neuron distribution between:
-
total neuron:
-
neuron per subgroup:
-
neuron per layer:
-
OHPL
- Use custom sampling function to describe the hierachical neuron distribution between:
-
total neuron:
-
neuron per layer:
-
Custom Sampling Function
split_sampling(num_ele, num_layers=None, n_min=1, n_max=None, n_samples=1, prepend=[], postpend=[], single_sample=False)
num_ele: Total number of elements to be distributed
n_min: Minimum number of elements per output dimension
n_max: Maximum number of elements per output dimension
num_layers: Number of layers to distribute the element, random dimensions will be given with None given
parameters_sampling(num_params, num_layers, in_dim, out_dim=1, n_min=1, n_max=None, n_samples=1, include_inout=True, single_sample=False, max_trials=1000)
num_params: Total number of parameters to be distributed
num_layers: Total number of layers
n_min: Minimum number of neurons per layer
n_max: Maximum number of neurons per layer
in_dim: Number of neurons at the input layer
out_dim: Number of neurons at the output layer
n_samples: Number of architecture samples to return (maximum number of samples return if there are less than demanded)
include_inout: Flag indicate whether to include input and output layer neurons with samples
max_trials: Maximum number of randomized trial for solution sampling if not enough samples found
Installation
Use the package manager pip to install CDST.
pip install git+https://github.com/Bayer-Group/cdst.git
Usage
import cdst
Contributing
For major changes, please open an issue first to discuss what you would like to change. For collaborative development, please initiate developement branch in the git repository and submit for approval prior merging into the master branch.
Please make sure to update tests as appropriate.
License
Written by Calvin W.Y. Chan calvin.chan@bayer.com, March 2022 (Github: https://github.com/calvinwy, Linkedin: https://www.linkedin.com/in/calchan/)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cdst-0.1.tar.gz
.
File metadata
- Download URL: cdst-0.1.tar.gz
- Upload date:
- Size: 24.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/0.0.0 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 21024b90547a01f33245e66df75c059722eee839f79b91464ad85c74faa15b93 |
|
MD5 | 8f35b64b1df9958baf2d26f72d32a71c |
|
BLAKE2b-256 | 2a26729fcdeaeb934b12fcc3d12e3f26699f502bd8b4257c244c70456cd275c9 |
File details
Details for the file cdst-0.1-py3-none-any.whl
.
File metadata
- Download URL: cdst-0.1-py3-none-any.whl
- Upload date:
- Size: 29.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/0.0.0 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 654e40db56389e5611d387686aa46a82b5eac0f768b61f9ed4b54055f2a19225 |
|
MD5 | f05b9ff5ddfe89dbd5835f0c23af7543 |
|
BLAKE2b-256 | 975ef738a8ae9d2e2549969ea167b557bf28c4e9a0c06b6eac0d152a787fbf73 |