Skip to main content

Library for experiments on measuring resample exposure similarity in heterogeneous datasets..

Project description

Resample Exposure Similarity

Resample Exposure similarity is a novel approach to measuring similarity in heterogeneous data based on the frequency of categorical variables and the density distribution of numericals. This library provides a rough implementation of the measure as a class that can compute a similarity matrix. Additional code is included in the repository, for implementing and experimenting with resample exposure similarity and competitors, including functions for calculating similarity matrices, performing nearest neighbour classification, and clustering using partitioning around medoids.

Installation

To install the library, run the following command:

pip install 

Tutorial

import seaborn as sns
from resample_exposure import ResampleExposure

df = sns.load_dataset('penguins')
df_train, df_test = df.iloc[:300], df.iloc[300:]

# Create an instance of the ResampleExposure class
# categorical features need not be specified, and will be automatically inferred
rex = ResampleExposure(target_distribution=df_train,
                       categorical_features=['species', 'island'], 
                       unique_threshold=5, 
                       feature_weights=None,
                       )

# single point comparison
similarity = rex.resample_exposure_sim(query_point=df_test.iloc[0], 
                                        target_point=df_train.iloc[0], 
                                        normalised=True
                                        )

# similarity matrix
similarity_matrix = rex.resample_exposure_matrix(query_df=df_test,
                                                  normalised = False, 
                                                  reverse_direction = False,
                                                  overwrite_memory = False,
                                                  n_jobs: int = -1
                                                  )

For the calculation of the similarity matrix, if no arguments are given it will return the similarity matrix of the target distribution with itself. Reverse direction is to use the query distribution as the target distribution, and the target distribution as the query distribution. Overwrite memory is an experimental feature to overwrite the memory of the marginal distributions with those of the queries in case the distribution of the target points are assumed to be unknown. The n_jobs parameter allows for parallel computation, where -1 uses all available cores.

Experiment Codebooks

This library is accompanied by a set of codebooks that demonstrate how to use the library and replicate the results shown in the paper "Similarity Based on Resample Exposure". The codebooks are designed to be used in Jupyter notebooks and provide step-by-step instructions for running experiments and generating synthetic data.

Below is codebooks that can be used to replicate the results shown in the paper.

Link Description Fig. refs.
Codebook 1 Figures and experiments exploring behaviour of resample exposure Fig.3
Codebook 2 Experiments and results for nearest neighbours classification Fig.4
Codebook 3 Experiments and results for the partitioning around medoids Fig.5

Requirements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rex_score-1.0.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rex_score-1.0-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file rex_score-1.0.tar.gz.

File metadata

  • Download URL: rex_score-1.0.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for rex_score-1.0.tar.gz
Algorithm Hash digest
SHA256 4cf7f8cc2f9748e55f4ae7c916a182029bcd08b6a4500fa9673a5c4db8cdef79
MD5 360dd89c6ec613d26a398b65836dc3f9
BLAKE2b-256 90b0738eb686b7728fce019c93fec6982663c17561494c725aca3931f21b4199

See more details on using hashes here.

Provenance

The following attestation bundles were made for rex_score-1.0.tar.gz:

Publisher: release.yml on notna07/resample-exposure-similairty

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rex_score-1.0-py3-none-any.whl.

File metadata

  • Download URL: rex_score-1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for rex_score-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bd8cc522a533c55434db9fdfccf2f988df6cdad13734f2dc179ecb722389c31f
MD5 5788c8b5184801ce682e3457bbf42ce3
BLAKE2b-256 f49f79ed8ca52d855539e61b6fd23152ffa6cab70b8b600c4d2a64d626c80993

See more details on using hashes here.

Provenance

The following attestation bundles were made for rex_score-1.0-py3-none-any.whl:

Publisher: release.yml on notna07/resample-exposure-similairty

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page