Skip to main content

Reproduce the database of use cases for WEMs.

Project description

Towards a Taxonomy of Word Embedding Models: The Database

This repository contains the code that was used to collect relevant publications and store them in a Postgres Database. You are welcome to reproduce our database! Note however that the search results on Google Scholar may have been updated since our publication, leading to slightly different results.

As a prerequisite, you must have set up a Postgres database with the proper database schema. To do this, you can execute our Postgres schema dump provided in data/wem_taxonomy_schema.sql with the psql shell:

psql dbname < data/wem_taxonomy_schema.sql

where dbname is the name of an empty database that you have already created for this purpose. You must also create a user called 'taxonomist' who will own the created tables. If you intend to use the database with our Use Case Collector (UCC) tool later on, you should execute the above schema dump with a Postgres user who has root privileges. This is needed to install the Postgres trigram extension (pg_trgm).

Next, you should clone this repository. Then, from within the repository root directory, pull in the pubfisher submodule:

git submodule update --remote lib/pubfisher

Now, install this module from source using pip:

python3 -m pip install -e lib/pubfisher

After that, you can install the reproduce_wem_taxonomy package as well:

python3 -m pip install -e .

In order to finally collect the publications, simply execute the module fish_wem_taxonomy:

python3 -m reproduce_wem_taxonomy.fish_wem_taxonomy

The publications are now stored in the database table 'publications' of your previously created Postgres database.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reproduce_wem_taxonomy-2020.1.14.tar.gz (5.4 kB view hashes)

Uploaded Source

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page