Skip to main content

A library used to assess the informative and discriminative properties of subspaces/patterns

Project description

DISA

DISA (Discriminative and Informative Subspace Analysis), a software package in Python (v3.7) capable of assessing patterns with numerical outputs by statistically testing the correlation gain of the subspace against the overall space

Input parameters

data : pandas.Dataframe

patterns : A python list where each position is a dictionary with the pattern properties:

[i] : { 
        "lines" : list (mandatory)
        "columns" : list (mandatory)
        "column_values": list (optional)
        "noise": list (optional)
        "type" : string (optional)
       }

Description of parameters:

  • "lines" refers to the observations of the pattern
  • "columns" refers to the variables of the pattern
  • "column_values" refers to the pattern coherence on columns
  • "noise" refers to the noise allowed in each column
  • "type" refers to the type of coherence (by default we assume constant coherence)

outcome : dict

{
    "values": pandas.Series (mandatory)
    "outcome_value" : int (optional)
    "type": string (optional)
    "method": string (optional)
    "heuristic": boolean (optional)
}

Description of parameters:

  • "values" the outcome variable
  • "outcome_value" : if the outcome variable is categorical the user can force DISA to analyse according to a specific category, we assume the category is represented by a discrete value. (by default it will select the best category per pattern)
  • "type": if the user wishes to analyse a continuous outcome variable this field should take the value "Numerical" (by default it is assumed to be categorical)
  • "method": if in the "type" parameter the user inputted "Numerical" then this field can be filled in with:
    • "min_max" - uses the minimum and the maximum values of each pattern to define the pattern-conditioned outcome intervals
    • "average" - uses the average +- standard deviation of each pattern to define the pattern-conditioned outcome intervals
    • "gaussian" (default) - assumes both the pattern-conditioned outcome and the outcome variable follow a normal distribution to define the pattern-conditioned outcome intervals
    • "empirical" - uses the empirical distribution of the pattern-conditioned outcome and outcome variables to define the pattern-conditioned outcome intervals
  • "heuristic": if in the "method" parameter the user inputted "empirical" then this field can be used to optimize the discriminative properties of each pattern

output_configurations : dict (optional)

{
    "print_table" : boolean 
    "file_path_table" : string
    "show_plots" : boolean
    "file_path_plots" : string
    "print_numeric_intervals" : boolean
    "file_path_numeric_intervals" : string
 }

Description of parameters:

  • "print_table" : if set to True will output a table of results (by default it is set to True)
  • "file_path_table" : path to file to write the output table to (by default it is set to None)
  • "show_plots" : if the "method" parameters in the outcome parameter was set to "gaussian" or "empirical" then this parameter if set to True will plot a figure (by default it is set to False)
  • "file_path_plots" : if the "method" parameters in the outcome parameter was set to "gaussian" or "empirical" then the path to a folder can be set in this parameter to output a PNG of the figures (by default it is set to None)
  • "print_numeric_intervals" : if the "type" parameter in the outcome parameter was set to "Numerical" then this parameter can be set to True to output the interval that the pattern discriminates (by default it is set to False)
  • "file_path_numeric_intervals" : if the "type" parameter in the outcome parameter was set to "Numerical" then the path to a folder can be set in this parameter to output to a file the intervals that each pattern discriminates

Dataset examples

Four examples on how to use DISA are provided in the folder "Example", the Echocardiogram, the Liver Disorders, the Breast Cancer Wisconsin (diagnostic), and Dodecanol datasets. Inside each of the datasets corresponding folder you will find a set of files and a folder. The python and jupyter notebook files provide the code to analyse patterns using DISA. The patterns are contained in the .txt files and the processed datasets in both the .csv and .arff files.

Package dependencies

pandas - 1.4.3

numpy - 1.23.1

scipy - 1.8.1

prettytable - 3.3.0

matplotlib - 3.5.1

Metrics

A list of all the implemented metrics in DISA and the corresponding DOI (some but not all of these metrics are futher explained in https://mhahsler.github.io/arules/docs/measures).

Information Gain: https://doi.org/10.1016/S0306-4379(03)00072-3

Chi-squared: https://doi.org/10.1145/253260.253327

Gini index: https://doi.org/10.1016/S0306-4379(03)00072-3

Difference in Support: 10.1109/TKDE.2010.241

Bigger Support: 10.1109/TKDE.2010.241

Confidence: 10.1145/170036.170072

All-Confidence: 10.1109/TKDE.2003.1161582

Lift: 10.1145/170036.170072

Standardised Lift: https://doi.org/10.1016/j.csda.2008.03.013

Collective Strength: https://dl.acm.org/doi/pdf/10.1145/275487.275490

Cosine: https://doi.org/10.1016/S0306-4379(03)00072-3

Interestingness: arXiv:1202.3215

Comprehensibility: arXiv:1202.3215

Completeness: arXiv:1202.3215

Added Value: https://doi.org/10.1016/S0306-4379(03)00072-3

Casual Confidence: https://doi.org/10.1007/3-540-44673-7_1

Casual Support: https://doi.org/10.1007/3-540-44673-7_1

Certainty Factor: 10.3233/IDA-2002-6303

Conviction: 10.1145/170036.170072

Coverage (Support): 10.1145/170036.170072

Descriptive Confirmed Confidence: https://doi.org/10.1016/S0306-4379(03)00072-3

Difference of Proportions: https://doi.org/10.1007/s001800100075

Example and Counter Example: SEBAG, M.; SCHOENAUER, M. Generation of rules with certainty and confidence factors from incomplete and incoherent learning bases. In: Proc. of EKAW. 1988. p. 28.

Imbalance Ratio: https://doi.org/10.1007/s10618-009-0161-2

Fisher's Exact Test (p-value): 10.3233/IDA-2007-11502

Hyper Confidence: 10.3233/IDA-2007-11502

Hyper Lift: 10.3233/IDA-2007-11502

Laplace Corrected Confidence: https://doi.org/10.1016/S0306-4379(03)00072-3

Importance: https://docs.microsoft.com/en-us/analysis-services/data-mining/microsoft-association-algorithm-technical-reference?view=asallproducts-allversions&viewFallbackFrom=sql-server-ver15

Jaccard Coefficient: https://doi.org/10.1016/S0306-4379(03)00072-3

J-Measure: NII Article ID (NAID) 10011699020

Kappa: https://doi.org/10.1016/S0306-4379(03)00072-3

Klosgen: https://doi.org/10.1016/S0306-4379(03)00072-3

Kulczynski: https://doi.org/10.1007/s10618-009-0161-2

Goodman-Kruskal's Lambda: https://doi.org/10.1016/S0306-4379(03)00072-3

Least Contradiction: (2004) Extraction de pepites de connaissances dans les donnees: Une nouvelle approche et une etude de sensibilite au bruit. In Mesures de Qualite pour la fouille de donnees. Revue des Nouvelles Technologies de l’Information, RNTI author : Aze, J. and Y. Kodratoff

Lerman Similarity: (1981) Classification et analyse ordinale des données. Author : Lerman, Israel-César.

Piatetsky-Shapiro: NII Article ID (NAID) 10000000985

Max Confidence: https://doi.org/10.1016/S0306-4379(03)00072-3

Odds Ratio: https://doi.org/10.1016/S0306-4379(03)00072-3

Phi Correlation Coefficient: https://doi.org/10.1016/S0306-4379(03)00072-3

Ralambondrainy: DIATTA, Jean; RALAMBONDRAINY, Henri; TOTOHASINA, André. Towards a unifying probabilistic implicative normalized quality measure for association rules. In: Quality Measures in Data Mining. Springer, Berlin, Heidelberg, 2007. p. 237-250.

Relative Linkage Disequilibrium: https://doi.org/10.1007/978-3-540-70720-2_15

Relative Risk: https://doi.org/10.1148/radiol.2301031028

Rule Power Factor:https://doi.org/10.1016/j.procs.2016.07.175

Sebag-Schoenauer : SEBAG, M.; SCHOENAUER, M. Generation of rules with certainty and confidence factors from incomplete and incoherent learning bases. In: Proc. of EKAW. 1988. p. 28.

Yule Q: https://doi.org/10.1016/S0306-4379(03)00072-3

Yule Y: https://doi.org/10.1016/S0306-4379(03)00072-3

Weighted Support: https://doi.org/10.1016/j.patcog.2021.107900

Weighted Rule Support: https://doi.org/10.1016/j.patcog.2021.107900

Weighted Confidence: https://doi.org/10.1016/j.patcog.2021.107900

Weighted Lift: https://doi.org/10.1016/j.patcog.2021.107900

Statistical Significance: https://doi.org/10.1007/s10618-017-0521-2

FleBiC Score: https://doi.org/10.1016/j.patcog.2021.107900

Authors

DISA was developed by:

L. Alexandre (leonardoalexandre@tecnico.ulisboa.pt), R.S. Costa (rs.costa@fct.unl.pt) and R. Henriques

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

DISAtool-1.0.1.tar.gz (15.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

DISAtool-1.0.1-py3-none-any.whl (15.9 kB view details)

Uploaded Python 3

File details

Details for the file DISAtool-1.0.1.tar.gz.

File metadata

  • Download URL: DISAtool-1.0.1.tar.gz
  • Upload date:
  • Size: 15.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.2

File hashes

Hashes for DISAtool-1.0.1.tar.gz
Algorithm Hash digest
SHA256 341a7eecb376d0d3be0b5aee0a4fb3769c2930e2523c33351275248737a8a311
MD5 6b5f02e6f3bf770ad417bfc370b77503
BLAKE2b-256 41bbb3edb3a400fe92afe5df821b2d0234977f1dffb345361b84313f6c72051d

See more details on using hashes here.

File details

Details for the file DISAtool-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: DISAtool-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 15.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.2

File hashes

Hashes for DISAtool-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 44f4d4fb0b527ec1f11c845fd569364b43d88aa7c9dff2c0ec09e1d6b680a26a
MD5 9cbf10940c582b6ffc499ef98b9cea26
BLAKE2b-256 f4cdf7e870bd564102cc38e7c30aa8bbf27240f3eec8196d6963a9a5aba2c581

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page