Generating count-based Distributional Semantic Models
Project description
counterix
A small toolkit to generate count-based PPMI-weighed SVD Distributional Semantic Models.
Install
pip install counterix
or, after a git clone:
python3 setup.py install
Use
Generate
To generate a raw count matrix from a tokenized corpus, run:
counterix generate \
--corpus /abs/path/to/corpus/txt/file \
--min-count frequency_threshold \
--win-size window_size
If the --output
parameter is not set, the output files will be saved to the corpus directory.
Weigh
To weigh a raw count model with PPMI, run:
counterix weigh --model /abs/path/to/raw/count/npz/model
SVD
To apply SVD on a PPMI-weighed model, with k=10000, run:
counterix svd \
--model /abs/path/to/ppmi/npz/model \
--dim singular_vectors_final_dim
To control the number of threads used during SVD, run counterix with env OMP_NUM_THREADS=1
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
counterix-1.1.1.tar.gz
(6.4 kB
view hashes)