Pointwise Hilbert–Schmidt Independence Criterion (PHSIC)
Project description
Pointwise Hilbert窶鉄chmidt Independence Criterion (PHSIC)
Compute co-occurrence between two objects utilizing similarities.
For example, given consistent sentence pairs:
| X | Y |
|---|---|
| They had breakfast at the hotel. | They are full now. |
| They had breakfast at ten. | I'm full. |
| She had breakfast with her friends. | She felt happy. |
| They had breakfast with their friends at the Japanese restaurant. | They felt happy. |
| He have trouble with his homework. | He cries. |
| I have trouble associating with others. | I cry. |
PHSIC can give high scores to consistent pairs in terms of the given pairs:
| X | Y | score |
|---|---|---|
| They had breakfast at the hotel. | They are full now. | 0.1134 |
| They had breakfast at an Italian restaurant. | They are stuffed now. | 0.0023 |
| I have dinner. | I have dinner again. | 0.0023 |
Installation
$ pip install phsic
This will install phsic command to your environment:
$ phsic --help
Basic Usage
Download pre-trained wordvecs (e.g. fasttext):
$ wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/crawl-300d-2M.vec.zip
$ unzip crawl-300d-2M.vec.zip
Prepare dataset:
$ TAB="$(printf '\t')"
$ cat << EOF > train.txt
They had breakfast at the hotel.${TAB}They are full now.
They had breakfast at ten.${TAB}I'm full.
She had breakfast with her friends.${TAB}She felt happy.
They had breakfast with their friends at the Japanese restaurant.${TAB}They felt happy.
He have trouble with his homework.${TAB}He cries.
I have trouble associating with others.${TAB}I cry.
EOF
$ cut -f 1 train.txt > train_X.txt
$ cut -f 2 train.txt > train_Y.txt
$ cat << EOF > test.txt
They had breakfast at the hotel.${TAB}They are full now.
They had breakfast at an Italian restaurant.${TAB}They are stuffed now.
I have dinner.${TAB}I have dinner again.
EOF
$ cut -f 1 test.txt > test_X.txt
$ cut -f 2 test.txt > test_Y.txt
Then, train and predict:
$ phsic train_X.txt train_Y.txt --kernel1 Gaussian 1.0 --encoder1 SumBov FasttextEn --emb1 crawl-300d-2M.vec --kernel2 Gaussian 1.0 --encoder2 SumBov FasttextEn --emb2 crawl-300d-2M.vec --limit_words1 10000 --limit_words2 10000 --dim1 3 --dim2 3 --out_prefix toy --out_dir out --X_test test_X.txt --Y_test test_Y.txt
$ cat toy.Gaussian-1.0-SumBov-FasttextEn.Gaussian-1.0-SumBov-FasttextEn.3.3.phsic
1.134489336180434238e-01
2.320408776101631244e-03
2.321869174772554344e-03
Citation
@InProceedings{D18-1203,
author = "Yokoi, Sho
and Kobayashi, Sosuke
and Fukumizu, Kenji
and Suzuki, Jun
and Inui, Kentaro",
title = "Pointwise HSIC: A Linear-Time Kernelized Co-occurrence Norm for Sparse Linguistic Expressions",
booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
year = "2018",
publisher = "Association for Computational Linguistics",
pages = "1763--1775",
location = "Brussels, Belgium",
url = "http://aclweb.org/anthology/D18-1203"
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file phsic-cli-0.1.0.tar.gz.
File metadata
- Download URL: phsic-cli-0.1.0.tar.gz
- Upload date:
- Size: 11.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.5 CPython/3.6.6 Windows/10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42dace8336026d0dcfbe2f7e062fff644e4cf908f0d65abee558a10950a59096
|
|
| MD5 |
72990d7778b639290b68bde5b9aa88f7
|
|
| BLAKE2b-256 |
f9e6aa405ee91d870b99a5491f6fcf8026406029b067e2c739ecbe801bfc3104
|
File details
Details for the file phsic_cli-0.1.0-py3-none-any.whl.
File metadata
- Download URL: phsic_cli-0.1.0-py3-none-any.whl
- Upload date:
- Size: 39.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.5 CPython/3.6.6 Windows/10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
085579e96158ebe8e69ecb5f4252d101654c7fa65c4e8c8606d7cd6e585fa9db
|
|
| MD5 |
c96834a010e903c0088984ffd23e8776
|
|
| BLAKE2b-256 |
68a35c906dd2d6c40f9d4e8f047ca603824bf8428b484c707c963cde4b137d6e
|