Predictive Bi-Clustering Trees in Python.
Project description
Predictive Bi-Clustering Trees (PBCT)
This code implements PBCTs based on its original proposal by Pliakos, Geurts and Vens in 20181. Functionality has been extended to n-dimensional interaction tensors, where n instances of n different classes would interact or not for each database instance.
1Pliakos, Konstantinos, Pierre Geurts, and Celine Vens. "Global multi-output decision trees for interaction prediction." Machine Learning 107.8 (2018): 1257-1281.
Installation
The package is available at PyPI and can be installed by the usual pip
command:
$ pip install pbct
Local installation can be done either by providing the --user
flag to the above command or by cloning this repo and issuing pip
afterwards, for example:
$ git clone https://github.com/pedroilidio/PCT
$ cd PCT
$ pip install -e .
Where the -e
option installs it as symbolic links to the local cloned repository, so that changes in it will reflect on the installation directly.
Usage
Usage and input/output examples are provided in the examples
folder.
We provide a command-line utility to use PBCT models, that shows the following information when the --help
option is used.
$ PBCT --help
usage: PBCT [-h] [--config CONFIG] [--XX XX [XX ...]]
[--XX_names XX_NAMES [XX_NAMES ...]]
[--XX_col_names XX_COL_NAMES [XX_COL_NAMES ...]] [--Y Y]
[--path_model PATH_MODEL] [--max_depth MAX_DEPTH]
[--min_samples_leaf MIN_SAMPLES_LEAF] [--verbose]
[--outdir OUTDIR] [--k K [K ...]] [--diag]
[--test_size TEST_SIZE [TEST_SIZE ...]]
[--train_size TRAIN_SIZE [TRAIN_SIZE ...]] [--njobs NJOBS]
[--random_state RANDOM_STATE]
[{fit,predict,train_test,xval}]
Fit a PBCT model to data or use a trained model to predict new results. Input
files and options may be provided either with command-line options or by a
json config file (see --config option).
positional arguments:
{fit,predict,train_test,xval}
fit: Use input data to train a PBCT. predict: Predict
interaction between input instances. train_test: Split
data between the 4 train/test sets, train and test a
PBCT. xval: run a 2D k-fold cross validation with the
given data. (default: None)
optional arguments:
-h, --help show this help message and exit
--config CONFIG Load options from json file. File example: {
"path_model": "/path/to/save/model.json", "fit":
"true", "XX": ["/path/to/X1.csv", "/path/to/X2.csv"],
"Y": "/path/to/Y.csv", }. Multiple dicts in a list
will cause this script to run multiple times.
(default: None)
--XX XX [XX ...] Paths to .csv files containing rows of numerical
attributes for each axis' instance. (default: None)
--XX_names XX_NAMES [XX_NAMES ...]
Paths to files containing string identifiers for each
instance for each axis, being one file for each axis.
(default: None)
--XX_col_names XX_COL_NAMES [XX_COL_NAMES ...]
Paths to files containing string identifiers for each
attributecolumn, being one file for each axis.
(default: None)
--Y Y If fitting the model to data, it represents the path
to a .csv file containing the known interaction matrix
between rows and columns data.If predicting, Y is the
destination path for the predicted values, formatted
as an interaction matrix in the same way described.
(default: None)
--path_model PATH_MODEL
When fitting: path to the location where the model
will be saved. When predicting: the saved model to be
used. (default: model.dict.pickle.gz)
--max_depth MAX_DEPTH
Maximum PBCT depth allowed. -1 will disable this
stopping criterion. (default: -1)
--min_samples_leaf MIN_SAMPLES_LEAF
Minimum number of sample pairs in the training set
required to arrive at each leaf. (default: 20)
--verbose, -v Show more detailed output (default: False)
--outdir OUTDIR Where to save results. (default: PBCT_results)
--k K [K ...], -k K [K ...]
Number of folds for cross-validation. (default: [3])
--diag Use independent TrTc sets for cross-validation, i.e.
with no overlapping rows or columns. (default: False)
--test_size TEST_SIZE [TEST_SIZE ...]
If between 0.0 and 1.0, represents the proportion of
the dataset to include in the TrTc split for each
axis, e.g.: .3 .5 means 30% of the rows and 50% of the
columns will be used as the TrTc set. If >= 1,
represents the absolute number of test samples in each
axis. If None, the values are set to the complements
of train_size. If a single value v is given, it will
be interpreted as (v, v). If train_size is also None,
it will be set to 0.25. (default: None)
--train_size TRAIN_SIZE [TRAIN_SIZE ...]
Same as test_size, but refers to the LrLc set
dimensions. (default: None)
--njobs NJOBS How many processes to spawn when cross-validating.
(default: None)
--random_state RANDOM_STATE
Random seed to use. (default: None)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pbct-0.3.2.tar.gz
.
File metadata
- Download URL: pbct-0.3.2.tar.gz
- Upload date:
- Size: 28.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b679a1c864e4286d9d241c268aaad48ec06d537209c7725a7eb13306e31a8b5 |
|
MD5 | 597486cafe4610f256aca4de34d63cf9 |
|
BLAKE2b-256 | e61e55db1fe33fbc9159ab46dfc907f7eb54561f990b95a37052e541c4e6dda8 |
File details
Details for the file pbct-0.3.2-py3-none-any.whl
.
File metadata
- Download URL: pbct-0.3.2-py3-none-any.whl
- Upload date:
- Size: 33.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2869c27eb726bc33c4d1d1344fc8f3f55b5cad9f8aa9e34340681bbb24ddb41a |
|
MD5 | b78f1112003cee17f72bf4ab17f10336 |
|
BLAKE2b-256 | 178e4cd2d30416c1cd9a812f3cd6484f5278a9f0796454754a8ed74ab2346ef8 |