Skip to main content

Toolbox for inferring psychological embeddings.

Project description

PsiZ: A Psychological Embedding Package

What's in a name?

The name PsiZ (pronounced like the word size, /sʌɪz/) is meant to serve as shorthand for the term psychological embedding. The greek letter Psi is often used to represent the field of psychology and the matrix variable Z is often used in machine learning to denote a latent feature space.

Purpose

PsiZ provides the computational tools to infer a continuous, multivariate stimulus representation using similarity relations. It integrates well-established cognitive theory with contemporary computational methods.

Installation

There is not yet a stable version. All APIs are subject to change and all releases are alpha.

To install the latest development version, clone from GitHub and instal the local repo using pip.

  1. Use git to clone the latest version to your local machine: git clone https://github.com/roads/psiz.git
  2. Use pip to install the cloned repo (using editable mode): pip install -e /local/path/to/psiz By using editable mode, you can easily update your local copy by use git pull origin master inside your local copy of the repo. You do not have to re-install with pip.

The package can also be obtained by:

  • Manually downloading the latest version at https://github.com/roads/psiz.git
  • Use git to clone a specific release, for example: git clone https://github.com/roads/psiz.git --branch v0.3.0
  • Using PyPi to install older alpha releases: pip install psiz. The versions available through PyPI lag behind the latest GitHub version.

Note: PsiZ also requires TensorFlow. In older versions of TensorFlow, CPU only versions were targeted separately. For Tensorflow >=2.0, both CPU-only and GPU versions are obtained via tensorflow. The current setup.py file fulfills this dependency by downloading the tensorflow package using pip.

Quick Start

Similarity relations can be collected using a variety of paradigms. You will need to use the appropriate model for your data. In addition to a model choice, you need to provide two additional pieces of information:

  1. The observed similarity relations (referred to as observations or obs).
  2. The number of unique stimuli that will be in your embedding (n_stimuli).

The following minimalist example uses a Rank psychological embedding to model a predefined set of ordinal similarity relations.

import psiz

# Load observations from a predefined dataset.
(obs, catalog) = psiz.datasets.load('birds-16')
# Create a TensorFlow embedding layer for the stimuli.
# NOTE: Since we will use masking, we increment n_stimuli by one.
stimuli = tf.keras.layers.Embedding(
    catalog.n_stimuli+1, mask_zero=True
)
# Use a default kernel (exponential with p-norm).
kernel = psiz.keras.layers.Kernel()
# Create a Rank model that subclasses TensorFlow Keras Model.
model = psiz.models.Rank(stimuli=stimuli, kernel=kernel)
# Wrap the model in convenient proxy class.
emb = psiz.models.Proxy(model)
# Compile the model.
emb.compile()
# Fit the psychological embedding using observations.
emb.fit(obs)
# Optionally save the fitted model.
emb.save('my_embedding')

Check out the examples directory to explore examples that take advantage of the various features that PsiZ offers.

Trials and Observations

Inference is performed by fitting a model to a set of observations. In this package, a single observation is comprised of trial where multiple stimuli that have been judged by an agent (human or machine) based on their similarity. There are currently three different types of trials: rank, rate and sort.

Rank

In the simplest case, an observation is obtained from a trial consisting of three stimuli: a query stimulus (q) and two reference stimuli (a and b). An agent selects the reference stimulus that they believe is more similar to the query stimulus. For this simple trial, there are two possible outcomes. If the agent selected reference a, then the observation for the ith trial would be recorded as the vector:

Di = [q a b]

Alternatively, if the agent had selected reference b, the observation would be recorded as:

Di = [q b a]

In addition to a simple triplet rank trial, this package is designed to handle a number of different rank trial configurations. A trial may have 2-8 reference stimuli and an agent may be required to select and rank more than one reference stimulus. A companion Open Access article dealing with rank trials is available at https://link.springer.com/article/10.3758/s13428-019-01285-3.

Rate

In the simplest case, an observation is obtained from a trial consisting of two stimuli. An agent provides a numerical rating regarding the similarity between the stimuli. This functionality is not currently available and is under development.

Sort

In the simplest case, an observation is obtained from a trial consisting of three stimuli. Ag agent sorts the stimuli into two groups based on similarity. This functionality is not currently available and is under development.

Using Your Own Data

To use your own data, you should place your data in an appropriate subclass of psiz.trials.Observations. Once the Observations object has been created, you can save it to disk by calling its save method. It can be loaded later using the function psiz.trials.load(filepath). Consider the following example that creates random rank observations:

import numpy as np
import psiz

# Let's assume that we have 10 unique stimuli.
stimuli_list = np.arange(0, 10, dtype=int)

# Let's create 100 trials, where each trial is composed of a query and
# four references. We will also assume that participants selected two
# references (in order of their similarity to the query.)
n_trial = 100
n_reference = 4
stimulus_set = np.empty([n_trial, n_reference + 1], dtype=int)
n_select = 2 * np.ones((n_trial), dtype=int)
for i_trial in range(n_trial):
    # Randomly selected stimuli and randomly simulate behavior for each
    # trial (one query, four references).
    stimulus_set[i_trial, :] = np.random.choice(
        stimuli_list, n_reference + 1, replace=False
    )

# Create the observations object and save it to disk.
obs = psiz.trials.RankObservations(stimulus_set, n_select=n_select)
obs.save('path/to/obs.hdf5')

# Load the observations from disk.
obs = psiz.trials.load_trials('path/to/obs.hdf5')

Note that the values in stimulus_set are assumed to be contiguous integers [0, N[, where N is the number of unique stimuli. Their order is also important. The query is listed in the first column, an agent's selected references are listed second (in order of selection if there are more than two) and then any remaining unselected references are listed (in any order).

Design Philosophy

PsiZ is built around the TensorFlow ecosystem and strives to follow TensorFlow idioms as closely as possible. See CONTRIBUTING for additional guidance.

Model, Layer, Variable

Package-defined models are built by sub-classing tf.keras.Model. Components of a model are built using the tf.keras.layers.Layer API. A free parameter is implemented as a tf.Variable.

In PsiZ, a model can be thought as having two major components. The first component is a psychological embedding which describes how the agent of interest perceives similarity between set of stimuli. This component includes a conventional embedding (representing the stimuli in psychological space) and a kernel that defines similarities between embedding points. The second component describes how similarities are converted into an observed behavior, such as rankings or ratings.

PsiZ includes a number of predefined layers to facilitate the construction of arbitrary models. For example, there are four predefined similarity functions (implemented as subclasses of tf.keras.layers.Layer) which can be used to create a kernel:

  1. psiz.keras.layers.InverseSimilarity
  2. psiz.keras.layers.ExponentialSimilarity
  3. psiz.keras.layers.HeavyTailedSimilarity
  4. psiz.keras.layers.StudentsTSimilarity

Each similarity function has its own set of parameters (i.e., tf.Variables). The ExponentialSimilarity, which is widely used in psychology, has four variables. Users can also implement there own similarity functions by sub-classing tf.keras.layers.Layers.

Deviations from TensorFlow

The models in PsiZ are susceptible to local optima. While the usual tricks help, such as stochastic gradient decent, we typically require multiple restarts with different initializations to be confident in the solution. In an effort to shield users from the burden of writing restart logic, PsiZ includes a restart module that is employed by the fit method of the Proxy class. The state of most TensorFlow objects can be reset using a serialization/deserialization strategy. However, tf.keras.callbacks do not permit this strategy. To fix this problem, PsiZ implements a subclass of tf.keras.callbacks.Callback, which adds a reset method. This solution is unattractive and is likely to change when a more elegant solution is found.

Modules

  • agents - Simulate an agent making similarity judgments.
  • catalog - Class for storing stimulus information.
  • datasets - Functions for loading some pre-defined catalogs and observations.
  • dimensionality - Routine for selecting the dimensionality of the embedding.
  • generators - Generate new trials randomly or using active selection.
  • keras - A module containing Keras related classes.
  • models - A set of pre-defined psychological embedding models.
  • preprocess - Functions for preprocessing observations.
  • restart - Classes and functionality for performing model restarts.
  • trials - Classes and functions for creating and managing observations.
  • utils - Utility functions.
  • visualize - Functions for visualizing embeddings.

Authors

  • Brett D. Roads
  • Michael C. Mozer
  • See also the list of contributors who participated in this project.

Licence

This project is licensed under the Apache Licence 2.0 - see LICENSE file for details.

Code of Conduct

This project uses a Code of Conduct CODE adapted from the [Contributor Covenant][homepage], version 2.0, available at https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.

References

  • van der Maaten, L., & Weinberger, K. (2012, Sept). Stochastic triplet embedding. In Machine learning for signal processing (mlsp), 2012 IEEE international workshop on (p. 1-6). doi:10.1109/MLSP.2012.6349720
  • Roads, B. D., & Mozer, M. C. (2019). Obtaining psychological embeddings through joint kernel and metric learning. Behavior Research Methods. 51(5), 2180-2193. doi:10.3758/s13428-019-01285-3
  • Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The Caltech-UCSD Birds-200-2011 Dataset (Tech. Rep. No. CNS-TR-2011-001). California Institute of Technology.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

psiz-0.4.1.tar.gz (37.8 kB view hashes)

Uploaded Source

Built Distribution

psiz-0.4.1-py3-none-any.whl (37.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page