Functions to compute probabilistic relevance scores from PHOC embeddings
Project description
prob-phoc
PyTorch functions to compute meaningful probabilistic relevance scores from PHOC (Pyramid of Histograms of Characters) embeddings. Although they are called Pyramid of Histograms of Characters, in practice they are a Pyramid of Bag of Characters. At the end, each word is represented by a high-dimensional binary vector.
See the wiki for additional details.
Usage
The library provides two functions: cphoc
and pphoc
, which are
similar to SciPy's cdist
and pdist
:
Both functions can operate with PHOC embeddings in the probability space (where each dimension is a real number in the range [0, 1]), or in the log-probability space (where each dimension is the logarithm of a probability). These are also sometimes refered to as the Real and Log semirings.
import torch
from prob_phoc import cphoc, pphoc
x = torch.Tensor(...)
y = torch.Tensor(...)
# Compute the log-relevance scores between all pairs of rows in x, y.
# Note: x and y must have the PHOC log-probabilities.
logprob = cphoc(x, y)
# This is equivalent to:
logprob = cphoc(x, y, method="sum_prod_log")
# If your matrices have probabilities instead of log-probabilities, use:
prob = cphoc(x, y, method="sum_prob_real")
# Compute the log-relevance scores between all pairs of distinct rows in x.
# Note: The output is a vector with N * (N - 1) / 2 elements.
logprob = pphoc(x)
Installation
The easiest way is to install the package from PyPI:
pip install prob-phoc
If you want to install the latest version from the repository, clone it and use the setup.py script to compile and install the library.
python setup.py install
You will need a C++11 compiler (tested with GCC 4.9). If you want to compile with CUDA support, you will also need to install the CUDA Toolkit (tested with versions 8.0, 9.0 and 10.0)
Tests and benchmarks
After the installation, you can run the tests to ensure that everything is working fine.
python -m prob_phoc.test
I have also some benchmarks to compare CPU vs. CUDA, for different matrix sizes and float precision. These take quite a long to run, so don't hold your breath.
python -m prob_phoc.benchmark
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file prob_phoc-0.2.0.tar.gz
.
File metadata
- Download URL: prob_phoc-0.2.0.tar.gz
- Upload date:
- Size: 10.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a6143f9cfdbbd94d53b97a88bd506b1c917dfce6b1bc9b45291931cf8253c22 |
|
MD5 | f858c4591cacaa29bc8cc7e4b8dd4abe |
|
BLAKE2b-256 | b630b218c6e430b7b22e368bd6fa67b7a8e74f993e9b3ac08b3f2acf1ee99145 |
File details
Details for the file prob_phoc-0.2.0-cp37-cp37m-manylinux1_x86_64.whl
.
File metadata
- Download URL: prob_phoc-0.2.0-cp37-cp37m-manylinux1_x86_64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.7m
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 494d378ac352935af226a0ffcf680b38cd30b97381a0339806261b0615314f21 |
|
MD5 | d00b1164931bd684940ff537090b8bcd |
|
BLAKE2b-256 | e3e7c57b0e33e3f2acb7ea312a1a82ecf127f51fd69eefcf9cd536727cd91f44 |
File details
Details for the file prob_phoc-0.2.0-cp36-cp36m-manylinux1_x86_64.whl
.
File metadata
- Download URL: prob_phoc-0.2.0-cp36-cp36m-manylinux1_x86_64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.6m
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9dd89673b63d2c52e2eab302b720041b47ff364ff99dfbab5ed8bcacdeaec48d |
|
MD5 | 6a30b9ee33888a91312763cd10a4c1b5 |
|
BLAKE2b-256 | 7a95d110c5a7af200e28d6850f5251abed6d07dc93090271958ad3b5a458b045 |
File details
Details for the file prob_phoc-0.2.0-cp35-cp35m-manylinux1_x86_64.whl
.
File metadata
- Download URL: prob_phoc-0.2.0-cp35-cp35m-manylinux1_x86_64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.5m
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 525c8a7393e1ca25974bb559bb0f5f961ed4a549711237bbd7529ebb3a5dd419 |
|
MD5 | b4a359e78d35a85358a1b012fe72fa83 |
|
BLAKE2b-256 | fbbd3535d1c23a7e9d42b980ddf1d4f03f38203f1dd4ccb1a6f5aded89de8f5c |
File details
Details for the file prob_phoc-0.2.0-cp27-cp27mu-manylinux1_x86_64.whl
.
File metadata
- Download URL: prob_phoc-0.2.0-cp27-cp27mu-manylinux1_x86_64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 2.7mu
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7b0cc8f200064f9172eddcaeb63f3bfa57e91da739eab4355419dea56ea3311d |
|
MD5 | 16a625380d83d4fee20b6c9bf2149418 |
|
BLAKE2b-256 | 218897a1b244175b124423b2a96bfbbc34cf3a267a81beda2b2523bc5ef14656 |