Reproduce the database of use cases for WEMs.
Project description
Towards a Taxonomy of Word Embedding Models: The Database
This repository contains the code that was used to collect relevant publications and store them in a Postgres Database. You are welcome to reproduce our database! Note however that the search results on Google Scholar may have been updated since our publication, leading to slightly different results.
As a prerequisite, you must have set up a Postgres database with the proper
database schema. To do this, you can execute our Postgres schema dump provided
in data/wem_taxonomy_schema.sql
with the psql
shell:
psql dbname < data/wem_taxonomy_schema.sql
where dbname
is the name of an empty database that you have already created
for this purpose.
You must also create a user called 'taxonomist' who will own the created
tables. If you intend to use the database with our Use Case Collector (UCC)
tool later on, you should execute the above schema dump with a Postgres user
who has root privileges.
This is needed to install the Postgres trigram extension (pg_trgm).
Next, you should clone this repository.
Then, from within the repository root directory,
pull in the pubfisher
submodule:
git submodule update --remote lib/pubfisher
Now, install this module from source using pip:
python3 -m pip install -e lib/pubfisher
After that, you can install the reproduce_wem_taxonomy
package as well:
python3 -m pip install -e .
In order to finally collect the publications,
simply execute the module fish_wem_taxonomy
:
python3 -m reproduce_wem_taxonomy.fish_wem_taxonomy
The publications are now stored in the database table 'publications' of your previously created Postgres database.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for reproduce_wem_taxonomy-2020.1.14.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00aded21b541615c50aa84c779b73fbf95d92480111ab251e4d12ecc9934af53 |
|
MD5 | 1cb8f1a6e906217c48cf48c709bf6cf3 |
|
BLAKE2b-256 | 03fed642ceb472876b36575d255de89942d8c48edbb7bbff6bf92f8fe6119c2a |
Hashes for reproduce_wem_taxonomy-2020.1.14-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e0749dd5620d5a46ab53c9ba5fad4bbe6333156a58f175c8b22474b7f8d509d0 |
|
MD5 | 94fe56ccc4d90a6d834f9a695fd125fc |
|
BLAKE2b-256 | 0bd5f2cfc734abd441f5a7fad44b2c851ae3914334be36c8d6af456306c4fb89 |