Skip to main content

Functions to compute probabilistic relevance scores from PHOC embeddings

Project description

prob-phoc

Build Status

PyTorch functions to compute meaningful probabilistic relevance scores from PHOC (Pyramid of Histograms of Characters) embeddings. Although they are called Pyramid of Histograms of Characters, in practice they are a Pyramid of Bag of Characters. At the end, each word is represented by a high-dimensional binary vector.

See the wiki for additional details.

Usage

The library provides two functions: cphoc and pphoc, which are similar to SciPy's cdist and pdist:

Both functions can operate with PHOC embeddings in the probability space (where each dimension is a real number in the range [0, 1]), or in the log-probability space (where each dimension is the logarithm of a probability). These are also sometimes refered to as the Real and Log semirings.

import torch
from prob_phoc import cphoc, pphoc

x = torch.Tensor(...)
y = torch.Tensor(...)

# Compute the log-relevance scores between all pairs of rows in x, y.
# Note: x and y must have the PHOC log-probabilities.
logprob = cphoc(x, y)

# This is equivalent to:
logprob = cphoc(x, y, method="sum_prod_log")

# If your matrices have probabilities instead of log-probabilities, use:
prob = cphoc(x, y, method="sum_prob_real")

# Compute the log-relevance scores between all pairs of distinct rows in x.
# Note: The output is a vector with N * (N - 1) / 2 elements.
logprob = pphoc(x)

Installation

The easiest way is to install the package from PyPI:

pip install prob-phoc

If you want to install the latest version from the repository, clone it and use the setup.py script to compile and install the library.

python setup.py install

You will need a C++11 compiler (tested with GCC 4.9). If you want to compile with CUDA support, you will also need to install the CUDA Toolkit (tested with versions 8.0, 9.0 and 10.0)

Tests and benchmarks

After the installation, you can run the tests to ensure that everything is working fine.

python -m prob_phoc.test

I have also some benchmarks to compare CPU vs. CUDA, for different matrix sizes and float precision. These take quite a long to run, so don't hold your breath.

python -m prob_phoc.benchmark

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prob_phoc-0.2.0.tar.gz (10.2 kB view details)

Uploaded Source

Built Distributions

prob_phoc-0.2.0-cp37-cp37m-manylinux1_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.7m

prob_phoc-0.2.0-cp36-cp36m-manylinux1_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.6m

prob_phoc-0.2.0-cp35-cp35m-manylinux1_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.5m

prob_phoc-0.2.0-cp27-cp27mu-manylinux1_x86_64.whl (1.3 MB view details)

Uploaded CPython 2.7mu

File details

Details for the file prob_phoc-0.2.0.tar.gz.

File metadata

  • Download URL: prob_phoc-0.2.0.tar.gz
  • Upload date:
  • Size: 10.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.6.8

File hashes

Hashes for prob_phoc-0.2.0.tar.gz
Algorithm Hash digest
SHA256 9a6143f9cfdbbd94d53b97a88bd506b1c917dfce6b1bc9b45291931cf8253c22
MD5 f858c4591cacaa29bc8cc7e4b8dd4abe
BLAKE2b-256 b630b218c6e430b7b22e368bd6fa67b7a8e74f993e9b3ac08b3f2acf1ee99145

See more details on using hashes here.

File details

Details for the file prob_phoc-0.2.0-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: prob_phoc-0.2.0-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.6.8

File hashes

Hashes for prob_phoc-0.2.0-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 494d378ac352935af226a0ffcf680b38cd30b97381a0339806261b0615314f21
MD5 d00b1164931bd684940ff537090b8bcd
BLAKE2b-256 e3e7c57b0e33e3f2acb7ea312a1a82ecf127f51fd69eefcf9cd536727cd91f44

See more details on using hashes here.

File details

Details for the file prob_phoc-0.2.0-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: prob_phoc-0.2.0-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.6.8

File hashes

Hashes for prob_phoc-0.2.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 9dd89673b63d2c52e2eab302b720041b47ff364ff99dfbab5ed8bcacdeaec48d
MD5 6a30b9ee33888a91312763cd10a4c1b5
BLAKE2b-256 7a95d110c5a7af200e28d6850f5251abed6d07dc93090271958ad3b5a458b045

See more details on using hashes here.

File details

Details for the file prob_phoc-0.2.0-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

  • Download URL: prob_phoc-0.2.0-cp35-cp35m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.6.8

File hashes

Hashes for prob_phoc-0.2.0-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 525c8a7393e1ca25974bb559bb0f5f961ed4a549711237bbd7529ebb3a5dd419
MD5 b4a359e78d35a85358a1b012fe72fa83
BLAKE2b-256 fbbd3535d1c23a7e9d42b980ddf1d4f03f38203f1dd4ccb1a6f5aded89de8f5c

See more details on using hashes here.

File details

Details for the file prob_phoc-0.2.0-cp27-cp27mu-manylinux1_x86_64.whl.

File metadata

  • Download URL: prob_phoc-0.2.0-cp27-cp27mu-manylinux1_x86_64.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: CPython 2.7mu
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.6.8

File hashes

Hashes for prob_phoc-0.2.0-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 7b0cc8f200064f9172eddcaeb63f3bfa57e91da739eab4355419dea56ea3311d
MD5 16a625380d83d4fee20b6c9bf2149418
BLAKE2b-256 218897a1b244175b124423b2a96bfbbc34cf3a267a81beda2b2523bc5ef14656

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page