Library for experiments on measuring resample exposure similarity in heterogeneous datasets..
Project description
Resample Exposure Similarity
Resample Exposure similarity is a novel approach to measuring similarity in heterogeneous data based on the frequency of categorical variables and the density distribution of numericals. This library provides a rough implementation of the measure as a class that can compute a similarity matrix. Additional code is included in the repository, for implementing and experimenting with resample exposure similarity and competitors, including functions for calculating similarity matrices, performing nearest neighbour classification, and clustering using partitioning around medoids.
Installation
To install the library, run the following command:
pip install
Tutorial
import seaborn as sns
from resample_exposure import ResampleExposure
df = sns.load_dataset('penguins')
df_train, df_test = df.iloc[:300], df.iloc[300:]
# Create an instance of the ResampleExposure class
# categorical features need not be specified, and will be automatically inferred
rex = ResampleExposure(target_distribution=df_train,
categorical_features=['species', 'island'],
unique_threshold=5,
feature_weights=None,
)
# single point comparison
similarity = rex.resample_exposure_sim(query_point=df_test.iloc[0],
target_point=df_train.iloc[0],
normalised=True
)
# similarity matrix
similarity_matrix = rex.resample_exposure_matrix(query_df=df_test,
normalised = False,
reverse_direction = False,
overwrite_memory = False,
n_jobs: int = -1
)
For the calculation of the similarity matrix, if no arguments are given it will return the similarity matrix of the target distribution with itself. Reverse direction is to use the query distribution as the target distribution, and the target distribution as the query distribution. Overwrite memory is an experimental feature to overwrite the memory of the marginal distributions with those of the queries in case the distribution of the target points are assumed to be unknown. The n_jobs parameter allows for parallel computation, where -1 uses all available cores.
Experiment Codebooks
This library is accompanied by a set of codebooks that demonstrate how to use the library and replicate the results shown in the paper "Similarity Based on Resample Exposure". The codebooks are designed to be used in Jupyter notebooks and provide step-by-step instructions for running experiments and generating synthetic data.
Below is codebooks that can be used to replicate the results shown in the paper.
| Link | Description | Fig. refs. |
|---|---|---|
| Codebook 1 | Figures and experiments exploring behaviour of resample exposure | Fig.3 |
| Codebook 2 | Experiments and results for nearest neighbours classification | Fig.4 |
| Codebook 3 | Experiments and results for the partitioning around medoids | Fig.5 |
Requirements
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rex_score-1.0.tar.gz.
File metadata
- Download URL: rex_score-1.0.tar.gz
- Upload date:
- Size: 12.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4cf7f8cc2f9748e55f4ae7c916a182029bcd08b6a4500fa9673a5c4db8cdef79
|
|
| MD5 |
360dd89c6ec613d26a398b65836dc3f9
|
|
| BLAKE2b-256 |
90b0738eb686b7728fce019c93fec6982663c17561494c725aca3931f21b4199
|
Provenance
The following attestation bundles were made for rex_score-1.0.tar.gz:
Publisher:
release.yml on notna07/resample-exposure-similairty
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rex_score-1.0.tar.gz -
Subject digest:
4cf7f8cc2f9748e55f4ae7c916a182029bcd08b6a4500fa9673a5c4db8cdef79 - Sigstore transparency entry: 281335528
- Sigstore integration time:
-
Permalink:
notna07/resample-exposure-similairty@2eea77f9c00010e5d85694d6ee05b1bf6f42b252 -
Branch / Tag:
refs/tags/v1.0 - Owner: https://github.com/notna07
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@2eea77f9c00010e5d85694d6ee05b1bf6f42b252 -
Trigger Event:
release
-
Statement type:
File details
Details for the file rex_score-1.0-py3-none-any.whl.
File metadata
- Download URL: rex_score-1.0-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd8cc522a533c55434db9fdfccf2f988df6cdad13734f2dc179ecb722389c31f
|
|
| MD5 |
5788c8b5184801ce682e3457bbf42ce3
|
|
| BLAKE2b-256 |
f49f79ed8ca52d855539e61b6fd23152ffa6cab70b8b600c4d2a64d626c80993
|
Provenance
The following attestation bundles were made for rex_score-1.0-py3-none-any.whl:
Publisher:
release.yml on notna07/resample-exposure-similairty
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rex_score-1.0-py3-none-any.whl -
Subject digest:
bd8cc522a533c55434db9fdfccf2f988df6cdad13734f2dc179ecb722389c31f - Sigstore transparency entry: 281335552
- Sigstore integration time:
-
Permalink:
notna07/resample-exposure-similairty@2eea77f9c00010e5d85694d6ee05b1bf6f42b252 -
Branch / Tag:
refs/tags/v1.0 - Owner: https://github.com/notna07
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@2eea77f9c00010e5d85694d6ee05b1bf6f42b252 -
Trigger Event:
release
-
Statement type: