Skip to main content

Pointwise Hilbert–Schmidt Independence Criterion (PHSIC)

Project description

Pointwise Hilbert窶鉄chmidt Independence Criterion (PHSIC)

Compute co-occurrence between two objects utilizing similarities.

For example, given consistent sentence pairs:

X Y
They had breakfast at the hotel. They are full now.
They had breakfast at ten. I'm full.
She had breakfast with her friends. She felt happy.
They had breakfast with their friends at the Japanese restaurant. They felt happy.
He have trouble with his homework. He cries.
I have trouble associating with others. I cry.

PHSIC can give high scores to consistent pairs in terms of the given pairs:

X Y score
They had breakfast at the hotel. They are full now. 0.1134
They had breakfast at an Italian restaurant. They are stuffed now. 0.0023
I have dinner. I have dinner again. 0.0023

Installation

$ pip install phsic

This will install phsic command to your environment:

$ phsic --help

Basic Usage

Download pre-trained wordvecs (e.g. fasttext):

$ wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/crawl-300d-2M.vec.zip
$ unzip crawl-300d-2M.vec.zip

Prepare dataset:

$ TAB="$(printf '\t')"
$ cat << EOF > train.txt
They had breakfast at the hotel.${TAB}They are full now.
They had breakfast at ten.${TAB}I'm full.
She had breakfast with her friends.${TAB}She felt happy.
They had breakfast with their friends at the Japanese restaurant.${TAB}They felt happy.
He have trouble with his homework.${TAB}He cries.
I have trouble associating with others.${TAB}I cry.
EOF
$ cut -f 1 train.txt > train_X.txt
$ cut -f 2 train.txt > train_Y.txt
$ cat << EOF > test.txt
They had breakfast at the hotel.${TAB}They are full now.
They had breakfast at an Italian restaurant.${TAB}They are stuffed now.
I have dinner.${TAB}I have dinner again.
EOF
$ cut -f 1 test.txt > test_X.txt
$ cut -f 2 test.txt > test_Y.txt

Then, train and predict:

$ phsic train_X.txt train_Y.txt --kernel1 Gaussian 1.0 --encoder1 SumBov FasttextEn --emb1 crawl-300d-2M.vec --kernel2 Gaussian 1.0 --encoder2 SumBov FasttextEn --emb2 crawl-300d-2M.vec --limit_words1 10000 --limit_words2 10000 --dim1 3 --dim2 3 --out_prefix toy --out_dir out --X_test test_X.txt --Y_test test_Y.txt
$ cat toy.Gaussian-1.0-SumBov-FasttextEn.Gaussian-1.0-SumBov-FasttextEn.3.3.phsic
1.134489336180434238e-01
2.320408776101631244e-03
2.321869174772554344e-03

Citation

@InProceedings{D18-1203,
  author = 	"Yokoi, Sho
        and Kobayashi, Sosuke
        and Fukumizu, Kenji
        and Suzuki, Jun
        and Inui, Kentaro",
  title = 	"Pointwise HSIC: A Linear-Time Kernelized Co-occurrence Norm for Sparse Linguistic Expressions",
  booktitle = 	"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
  year = 	"2018",
  publisher = 	"Association for Computational Linguistics",
  pages = 	"1763--1775",
  location = 	"Brussels, Belgium",
  url = 	"http://aclweb.org/anthology/D18-1203"
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phsic-cli-0.1.0.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

phsic_cli-0.1.0-py3-none-any.whl (39.2 kB view details)

Uploaded Python 3

File details

Details for the file phsic-cli-0.1.0.tar.gz.

File metadata

  • Download URL: phsic-cli-0.1.0.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.5 CPython/3.6.6 Windows/10

File hashes

Hashes for phsic-cli-0.1.0.tar.gz
Algorithm Hash digest
SHA256 42dace8336026d0dcfbe2f7e062fff644e4cf908f0d65abee558a10950a59096
MD5 72990d7778b639290b68bde5b9aa88f7
BLAKE2b-256 f9e6aa405ee91d870b99a5491f6fcf8026406029b067e2c739ecbe801bfc3104

See more details on using hashes here.

File details

Details for the file phsic_cli-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: phsic_cli-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 39.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.5 CPython/3.6.6 Windows/10

File hashes

Hashes for phsic_cli-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 085579e96158ebe8e69ecb5f4252d101654c7fa65c4e8c8606d7cd6e585fa9db
MD5 c96834a010e903c0088984ffd23e8776
BLAKE2b-256 68a35c906dd2d6c40f9d4e8f047ca603824bf8428b484c707c963cde4b137d6e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page