A set of python modules for machine learning and data mining
Project description
Installation
Dependencies
forest-gis requires:
Python (>= 3.6)
NumPy (>= 1.15.0)
SciPy (>= 0.19.1)
joblib (>= 0.14)
scikit-learn (>=0.19.0)
For Windwos
If you already have a working installation of numpy and scipy, and you plateform is Windows 32-bit or 64-bit, the easiest way to install forest-gis is using pip
pip install -U forest-gis
or conda
conda install -c conda-forge forest-gis
For linux
At present, on the pypi, we only provide wheel files supporting Python3.6, 3.7, 3.8 for Windows 32-bit, Windows 64-bit. Though the wheel files for Linux 64-bit are also provided, you may encouter problems if your Linux system has a lower version of glibc than ubantu 18.x because the wheel files was just compiled on ubantu 18.x If you get wrong when use pip to install forest-gis, you can try to install “forest-gis” from source.
For macOS
At present, install forest-gis from wheel files are not provied for macOS.
Build forest-gis from source
For Windows and Linux
Necessarily, before you install the forest-gis from source, you need to first install or update cython and numpy to the newest version and then run
pip install cython pip install numpy pip install --verbose .
For macOS, first install the macOS command line tools
brew install libomp
Set the following environment variables
export CC=/usr/bin/clang export CXX=/usr/bin/clang++ export CPPFLAGS="$CPPFLAGS -Xpreprocessor -fopenmp" export CFLAGS="$CFLAGS -I/usr/local/opt/libomp/include" export CXXFLAGS="$CXXFLAGS -I/usr/local/opt/libomp/include" export LDFLAGS="$LDFLAGS -Wl,-rpath,/usr/local/opt/libomp/lib -L/usr/local/opt/libomp/lib -lomp"
Finally, build forest-gis
pip install --verbose .
User Guide
Compute local variable importance based on the impurity metric
# use Boston house-price datasets as an example from sklearn.datasets import load_boston train_x, train_y = load_boston(return_X_y=True) # partition_feature could a column from train_x partition_feature = train_x[:, 1] from sklearn.ensemble import RandomForestRegressor, ExtraTreesRegressor from forest.ensemble import impurity_LVIG_RFRegressor from forest.ensemble import impurity_LVIG_EXTRegressor rf = RandomForestRegressor(500, max_features=0.3) rf.fit(train_x, train_y) ## using random forest model to compute local variable importance var_names = ["var_" + str(i) for i in range(train_x.shape[1])] lvig_handler = impurity_LVIG_RFRegressor(rf, var_names) local_variable_importance = lvig_handler.lvig(train_x, train_y, partition_feature = partition_feature) # use extra-trees to compute local variable importance model = ExtraTreesRegressor(500, max_features=0.3) model.fit(train_x, train_y) lvig_handler = impurity_LVIG_EXTRegressor(rf, var_names) local_variable_importance = lvig_handler.lvig(train_x, train_y, partition_feature = partition_feature)
or compute local variable importance based on the accuracy metric
from forest.ensemble import accuracy_LVIG model = RandomForestRegressor(500, max_features=0.3) model.fit(train_x, train_y) lvig_handler = accuracy_LVIG(model) ## compute local variable importance ## local_variable_importance = lvig_handler.compute_feature_importance(train_x, train_y, partition_feature = partition_feature) ## as the accuracy-based LVIG is a model-agnostic method, using other model like xgboost and gradient booting decission tree is applicable from sklearn.ensemble import GradientBoostingRegressor import xgboost as xgb ## based on gradient boosting decission tree model = GradientBoostingRegressor(n_estimators = 500, max_depth = 15, learning_rate=0.05, subsample=0.5, max_features=5) model.fit(train_x, train_y) lvig_handler = lvig(model) data = lvig_handler.compute_feature_importance(train_x, train_y, partition_feature) ## based on xgboost model = xgb.XGBRegressor(n_estimators = 500, max_depth = 15, subsample = 0.5, eval_metric = "rmse", objective = "reg:linear", n_jobs=20, eta = 0.05, colsample_bynode = 0.33334) model.fit(train_x, train_y) lvig_handler = lvig(model) data = lvig_handler.compute_feature_importance(train_x, train_y, partition_feature)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for forest_gis-2.0.0-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a8b36241f404783c301be9583390292a54a315904fbdb5c9484010a76c0a53bf |
|
MD5 | a4f0ffc368fdf812ab04ec2d3ba21be3 |
|
BLAKE2b-256 | effff113512267219d7539226973b7dd939a1b075c823794217f0c0d75d6c236 |
Hashes for forest_gis-2.0.0-cp38-cp38-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4af784333ee03af117d3b142ba849e2bf2f1cc6d7bc6c85b0c4178efb2da909a |
|
MD5 | c6adb619bb9b8bf0050a5537d3cf8500 |
|
BLAKE2b-256 | 10f6ddcc270093d8914e22c9154d66ef4a7f6e289636ba4c6f578bbf23b54f55 |
Hashes for forest_gis-2.0.0-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4568f653e2a42554a52480921219e0863b13b9fed4b063615a32a293a77f6b0d |
|
MD5 | 9fb1dc7ced3c71e8be83456383812069 |
|
BLAKE2b-256 | 1b6ba9123d5db9cab6a4e698ad7282b074163adbcbb3761f0728183dc8e0a20b |
Hashes for forest_gis-2.0.0-cp37-cp37m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 791348c555fc43d3fd69c3ca2c203a9bc21c358cd87caf1cb459c20d40446bb5 |
|
MD5 | b73e067fe7131156259be8bb6d76de39 |
|
BLAKE2b-256 | eb6a4309a40398220b1aabf08859131835854c42707b46d1f5e177b801ca2a69 |
Hashes for forest_gis-2.0.0-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c9259ae19bccef924b5da5abdd3bdf69b8f4d9a826ab2f0c05be2a05ea46e46 |
|
MD5 | bc05aed77c2e85821168ab66a3e42197 |
|
BLAKE2b-256 | 2a4e3aeee07540395a77b4f1929136861ea6bbe923e66daebcc27576d83cf4b7 |
Hashes for forest_gis-2.0.0-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49b58508025eac5ba35fbc6d3b9d7625efc4fce36ae6641bfbbb7b40bd1c7b54 |
|
MD5 | 0ff682caef87dd18588c14710e772b4a |
|
BLAKE2b-256 | 79ec4085b006ca1e8f09e21260bc9a8eeff148ad8dfa23a7e7ff2826ee59b89b |