An an unsupervised discretization method, DI2, for variables with arbitrarily skewed distributions.
Project description
DI2 (Distribution Discretizer)
Discretizer
distribution_discretizer(dataset, number_of_bins, statistical_test, cutoff_margin, kolmogorov_opt, normalizer, distributions, single_column_discretization)
distribution_discretizer(pandas.Dataframe, integer, optional:string, optional:float, optional:boolean, optional:string, optional:array) - Discretizes data according to the best fitting distribution.
The distribution_discretizer(pandas.Dataframe, integer, string, float, boolean, string, array) receives the data (pandas.Dataframe), an integer representing the number of categories for discretization, an string with the name of the main statistical hypothesis test to apply (options available: "chi2", "ks"), a float between 0 and 0.49 which indicates the width range to consider a value as being a border value, a boolean indicating if outliers should be removed, a string indicating the normalization method to be used (options available: "min_max","mean", z_score), an array of continuous distributions, from https://docs.scipy.org/doc/scipy/reference/stats.html, to be considered by the discretizer.
Data normalization:
Normalizes data by min_max normalization https://en.wikipedia.org/wiki/Feature_scaling#Rescaling
Normalizes data z-score normalization https://en.wikipedia.org/wiki/Feature_scaling#Standardization
Normalizes data by mean normalization https://en.wikipedia.org/wiki/Feature_scaling#Mean_normalization
Goodness of fit test
Person's chi-squared goodness of fit test https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test
Kolmogorov–Smirnov goodness of fit test https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test
As an illustrative example we use the dataset available at the UCI machine learning repository https://archive.ics.uci.edu/ml/datasets/Breast+Tissue.
---> DI2 was developed by L. Alexandre (leonardoalexandre@tecnico.ulisboa.pt), R.S. Costa (rs.costa@fct.unl.pt) and R. Henriques <---
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file DI2-1.0.2.tar.gz.
File metadata
- Download URL: DI2-1.0.2.tar.gz
- Upload date:
- Size: 8.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7a16b9e2a89e8e62aaf5b17269146520621acadaff843205eee6de5b558bc87
|
|
| MD5 |
1d822406550c72e61679211f0e52d0b3
|
|
| BLAKE2b-256 |
4f33ed85a1b3236f0cb734ce543960a7a1fdcd6d9ef139ea56df99e339457e96
|
File details
Details for the file DI2-1.0.2-py3-none-any.whl.
File metadata
- Download URL: DI2-1.0.2-py3-none-any.whl
- Upload date:
- Size: 8.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
507990f438d58ff55b31893df2cc09f8c5e53e299ea9657604db9f58d35eaada
|
|
| MD5 |
5dfa9d879ada6a8b9f33732f54b97fc1
|
|
| BLAKE2b-256 |
3c2a0acce8ebce5614d3c19fe9f139562d7796ae0da4a79ff622a235d73e0324
|