Skip to main content

Library for experiments on measuring resample exposure similarity in heterogeneous datasets..

Project description

PyPI version

Resample Exposure Similarity

Resample Exposure similarity is a novel approach to measuring similarity in heterogeneous data based on the frequency of categorical variables and the density distribution of numericals. This library provides a rough implementation of the measure as a class that can compute a similarity matrix. Additional code is included in the repository, for implementing and experimenting with resample exposure similarity and competitors, including functions for calculating similarity matrices, performing nearest neighbour classification, and clustering using partitioning around medoids.

Installation

To install the library, run the following command:

pip install rex-score

Tutorial

import seaborn as sns
from rex_score import ResampleExposure

df = sns.load_dataset('penguins')
df_train, df_test = df.iloc[:300], df.iloc[300:]

# Create an instance of the ResampleExposure class
# categorical features need not be specified, and will be automatically inferred
rex = ResampleExposure(target_distribution=df_train.dropna(),
                       categorical_features=['species', 'island'], 
                       unique_threshold=5, 
                       feature_weights=None,
                       )

# single point comparison
similarity = rex.resample_exposure_sim(query_point=df_test.iloc[0], 
                                        target_point=df_train.iloc[0], 
                                        normalised=True
                                        )

# similarity matrix
similarity_matrix = rex.resample_exposure_matrix(query_df=df_test,
                                                  normalised = False, 
                                                  reverse_direction = False,
                                                  overwrite_memory = False,
                                                  n_jobs = -1
                                                  )

For the calculation of the similarity matrix, if no arguments are given it will return the similarity matrix of the target distribution with itself. Reverse direction is to use the query distribution as the target distribution, and the target distribution as the query distribution. Overwrite memory is an experimental feature to overwrite the memory of the marginal distributions with those of the queries in case the distribution of the target points are assumed to be unknown. The n_jobs parameter allows for parallel computation, where -1 uses all available cores.

Experiment Codebooks

This library is accompanied by a set of codebooks that demonstrate how to use the library and replicate the results shown in the paper "Similarity Based on Resample Exposure". The codebooks are designed to be used in Jupyter notebooks and provide step-by-step instructions for running experiments and generating synthetic data.

Below is codebooks that can be used to replicate the results shown in the paper.

Link Description Fig. refs.
Codebook 1 Figures and experiments exploring behaviour of resample exposure Fig.3
Codebook 2 Experiments and results for nearest neighbours classification Fig.4
Codebook 3 Experiments and results for the partitioning around medoids Fig.5

Library Requirements

  • pandas
  • numpy
  • joblib

Experiment Requirements

  • seaborn
  • matplotlib
  • scikit-learn
  • scipy
  • gower
  • json
  • ucimlrepo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rex_score-1.1.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rex_score-1.1-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file rex_score-1.1.tar.gz.

File metadata

  • Download URL: rex_score-1.1.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for rex_score-1.1.tar.gz
Algorithm Hash digest
SHA256 edac77127feb7e901d7e8aebc2b55e1cdd4409f29117e9da2b4d76880e5dd2fc
MD5 8e455ee51dec4a23152ac258deda70d1
BLAKE2b-256 34a16e3647c44696d8229bd18d18ff2d20e4673060328761c534a06a523bc134

See more details on using hashes here.

Provenance

The following attestation bundles were made for rex_score-1.1.tar.gz:

Publisher: release.yml on notna07/resample-exposure-similairty

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rex_score-1.1-py3-none-any.whl.

File metadata

  • Download URL: rex_score-1.1-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for rex_score-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a10d6fdf6c626865cad25041a3ad9aa1d60029f8a2940e8c981bdf538a81644a
MD5 a7a804ad77646d6bfe20d6677a48da54
BLAKE2b-256 460526b2781c88e7ade6d694cf0280dafe5ea4003a688285bdd3bc597d316058

See more details on using hashes here.

Provenance

The following attestation bundles were made for rex_score-1.1-py3-none-any.whl:

Publisher: release.yml on notna07/resample-exposure-similairty

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page