Generating count-based Distributional Semantic Models
Project description
counterix
A small toolkit to generate count-based PPMI-weighed SVD Distributional Semantic Models.
Install
pip install counterix
or, after a git clone:
python3 setup.py install
Use
Generate
To generate a raw count matrix from a tokenized corpus, run:
counterix generate \
--corpus /abs/path/to/corpus/txt/file \
--min-count frequency_threshold \
--win-size window_size
If the --output
parameter is not set, the output files will be saved to the corpus directory.
Weigh
To weigh a raw count model with PPMI, run:
counterix weigh --model /abs/path/to/raw/count/npz/model
SVD
To apply SVD on a PPMI-weighed model, with k=10000, run:
counterix svd \
--model /abs/path/to/ppmi/npz/model \
--dim singular_vectors_final_dim
To control the number of threads used during SVD, run counterix with env OMP_NUM_THREADS=1
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
counterix-1.2.2.tar.gz
(6.4 kB
view details)
File details
Details for the file counterix-1.2.2.tar.gz
.
File metadata
- Download URL: counterix-1.2.2.tar.gz
- Upload date:
- Size: 6.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 20f6a1eab665c5d27cbd97aea8272d3461abd62041bed39c11c5e347657dc1cc |
|
MD5 | cdd218024a6e4bda3e876b372374c5f1 |
|
BLAKE2b-256 | a6fda2a94766afe726f8e1502bdd2b6a963fe07c066e9449429130b45fb868b3 |