Instance hardness package
Project description
PyHard
Instance Hardness Python package
Getting Started
Python 3.7 is required. Matlab is also required in order to run matilda. As far as we know, only recent versions of Matlab offer an engine for Python 3. Namely, we have tested only version R2020a.
Installation
- Clone repository
git clone https://gitlab.com/ita-ml/instance-hardness.git
- Install package via pip
cd instance-hardness/
pip install -e .
- Install Matlab engine for Python
Refer to this link, which contains detailed instructions.
Usage
In the command line (terminal):
cd your/path/instance-hardness
python pyhard
Or run it from elsewhere with:
python -m pyhard
It should generate the metadata.csv
file and run the Matilda software.
One can choose which steps should be disabled or not (e.g. --no-meta
or --no-matilda
). To see all command line options, run python pyhard -h
for help.
Visualization
Demo
The demo visualization app can display any dataset located in your-path/instance-hardness/data/
. Each folder within this directory (whose name is the problem name) should contain those three files:
-
data.csv
: the dataset itself; -
metadata.csv
: the metadata with measures and algorithm performances (feature_
andalgo_
columns); -
coordinates.csv
: the instance space coordinates generated by Matilda.
The showed data can be chosen through the app interface. To run it use the command:
python -m pyhard --demo
New problems may be added as a new folder in data/
. Multidimensional data will be reduced with the chosen dimensionality reduction method.
App
Through command line it is possible to launch an app for visualization of 2D-datasets along with their respective instance space. The graphics are linked, and options for color and displayed hover are available. In order to run only the app:
python -m pyhard --no-meta --no-matilda --app
It should open the browser automatically and display the data.
Configuration
See the file config.yaml
in /instance-hardness/conf/
. It contains options for file paths, measures to be calculated, which classifiers to use and their parametrization.
References
-
Michael R. Smith, Tony Martinez, and Christophe Giraud-Carrier. 2014. An instance level analysis of data complexity. Mach. Learn. 95, 2 (May 2014), 225–256.
-
Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin Kam Ho. 2019. How Complex Is Your Classification Problem? A Survey on Measuring Classification Complexity. ACM Comput. Surv. 52, 5, Article 107 (October 2019), 34 pages.
-
Mario A. Muñoz, Laura Villanova, Davaatseren Baatar, and Kate Smith-Miles. 2018. Instance spaces for machine learning classification. Mach. Learn. 107, 1 (January 2018), 109–147.
-
Luiz H. Lorena, André C. Carvalho, and Ana C. Lorena. 2015. Filter Feature Selection for One-Class Classification. Journal of Intelligent and Robotic Systems 80, 1 (October 2015), 227–243.
-
Artur J. Ferreira and MáRio A. T. Figueiredo. 2012. Efficient feature selection filters for high-dimensional data. Pattern Recognition Letters 33, 13 (October, 2012), 1794–1804.
-
Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang, and Huan Liu. 2017. Feature Selection: A Data Perspective. ACM Comput. Surv. 50, 6, Article 94 (January 2018), 45 pages.
-
Shuyang Gao, Greg Ver Steeg, and Aram Galstyan. Efficient Estimation of Mutual Information for Strongly Dependent Variables. Available in http://arxiv.org/abs/1411.2003. AISTATS, 2015.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pyhard-0.3.tar.gz
.
File metadata
- Download URL: pyhard-0.3.tar.gz
- Upload date:
- Size: 2.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.0.0.post20201207 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ff054d15df8c8fe4ec55fb3dc54237d6b408f14f3595bbca17d973add76fe67 |
|
MD5 | 32a2ab386168c18e63e263dedfc450aa |
|
BLAKE2b-256 | c7d3a281296cacb74be1bb67d798af11e73cff38ec0e177d54b1e8974c6933c8 |