Skip to main content

Predictive Bi-Clustering Trees in Python.

Project description

Predictive Bi-Clustering Trees (PBCT)

This code implements PBCTs based on its original proposal by Pliakos, Geurts and Vens in 2018. Functionality has been extended to n-dimensional interaction tensors, where n instances of n different classes would interact or not for each database instance.

Installation

The package is available at PyPI and can be installed by the usual pip command:

$ pip install pbct

Local installation can be done either by providing the --user flag to the above command or by cloning this repo and issuing pip afterwards, for example:

$ git clone https://github.com/pedroilidio/PCT
$ cd PCT
$ pip install -e .

Where the -e option installs it as symbolic links to the local cloned repository, so that changes in it will reflect on the installation directly.

Usage

Usage and input/output examples are provided in the examples folder. We provide a command-line utility to use PBCT models, that shows the following information when the --help option is used.

$ PBCT --help

usage: PBCT [-h] [--fit | --predict] [--config CONFIG] [--XX XX [XX ...]]
            [--XX_names XX_NAMES [XX_NAMES ...]]
            [--XX_col_names XX_COL_NAMES [XX_COL_NAMES ...]] [--Y Y]
            [--path_model PATH_MODEL] [--max_depth MAX_DEPTH]
            [--min_samples_leaf MIN_SAMPLES_LEAF] [--simple_mean] [--verbose]

Fit a PBCT model to data or use a trained model to predict new results. Input
files and options may be provided either with command-line options or by a
json config file (see --config option).

optional arguments:
  -h, --help            show this help message and exit
  --fit                 Use input data to train a PBCT. (default: False)
  --predict             Predict interaction between input instances. (default:
                        False)
  --config CONFIG       Load options from json file. File example:
                        {
                            "path_model": "/path/to/save/model.json",
                            "fit": "true",
                            "XX": ["/path/to/X1.csv", "/path/to/X2.csv"],
                            "Y": "/path/to/Y.csv"
                        }.
                        Multiple dicts in a list
                        will cause this script to run multiple times.
                        (default: None)
  --XX XX [XX ...]      Paths to .csv files containing rows of numerical
                        attributes for each axis' instance. (default: None)
  --XX_names XX_NAMES [XX_NAMES ...]
                        Paths to files containing string identifiers for each
                        instance for each axis, being one file for each axis.
                        (default: None)
  --XX_col_names XX_COL_NAMES [XX_COL_NAMES ...]
                        Paths to files containing string identifiers for each
                        attributecolumn, being one file for each axis.
                        (default: None)
  --Y Y                 If fitting the model to data, it represents the path
                        to a .csv file containing the known interaction matrix
                        between rows and columns data.If predicting, Y is the
                        destination path for the predicted values, formatted
                        as an interaction matrix in the same way described.
                        (default: None)
  --path_model PATH_MODEL
                        When fitting: path to the location where the model
                        will be saved. When predicting: the saved model to be
                        used. (default: trained_model.json)
  --max_depth MAX_DEPTH
                        Maximum PBCT depth allowed. -1 will disable this
                        stopping criterion. (default: -1)
  --min_samples_leaf MIN_SAMPLES_LEAF
                        Minimum number of sample pairs in the training set
                        required to arrive at each leaf. (default: 20)
  --simple_mean         If provided, the prototype function will always return
                        the mean value over the entire sub interaction matrix
                        of the leaf, not considering possible known instances.
                        (default: False)
  --verbose, -v         Show more detailed output (default: False)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pbct-0.1.1.tar.gz (12.2 kB view hashes)

Uploaded Source

Built Distribution

pbct-0.1.1-py3-none-any.whl (25.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page