Skip to main content

Reproduce the database of use cases for WEMs.

Project description

Towards a Taxonomy of Word Embedding Models: The Database

This repository contains the code that was used to collect relevant publications and store them in a Postgres Database. You are welcome to reproduce our database! Note however that the search results on Google Scholar may have been updated since our publication, leading to slightly different results.

As a prerequisite, you must have set up a Postgres database with the proper database schema. To do this, you can execute our Postgres schema dump provided in data/wem_taxonomy_schema.sql with the psql shell:

psql dbname < data/wem_taxonomy_schema.sql

where dbname is the name of an empty database that you have already created for this purpose. You must also create a user called 'taxonomist' who will own the created tables. If you intend to use the database with our Use Case Collector (UCC) tool later on, you should execute the above schema dump with a Postgres user who has root privileges. The root privileges are needed to install the Postgres trigram extension (pg_trgm).

Step 1a): Installing from PyPI

Simply execute

python3 -m pip install reproduce_wem_taxonomy

to install the needed packages.

Step 1b): Installing from source

First, clone this repository. Then, from within the repository root directory, pull in the pubfisher submodule:

git submodule update --remote lib/pubfisher

Now, install this module from source using pip:

python3 -m pip install -e lib/pubfisher

After that, you can install the reproduce_wem_taxonomy package as well:

python3 -m pip install -e .

Step 2: Collecting the publications from Google Scholar

In order to finally collect the publications, simply execute the module fish_wem_taxonomy:

python3 -m reproduce_wem_taxonomy.collect_relevant_publications

The publications are now stored in the database table 'publications' of your previously created Postgres database.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reproduce_wem_taxonomy-2020.1.29.tar.gz (5.8 kB view hashes)

Uploaded Source

Built Distribution

reproduce_wem_taxonomy-2020.1.29-py3-none-any.whl (10.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page