Skip to main content

Reproduce the database of use cases for WEMs.

Project description

Towards a Taxonomy of Word Embedding Models: The Database

This repository contains the code that was used to collect relevant publications and store them in a Postgres Database. You are welcome to reproduce our database! Note however that the search results on Google Scholar may have been updated since our publication, leading to slightly different results.

As a prerequisite, you must have set up a Postgres database with the proper database schema. To do this, you can execute our Postgres schema dump provided in data/wem_taxonomy_schema.sql with the psql shell:

psql dbname < data/wem_taxonomy_schema.sql

where dbname is the name of an empty database that you have already created for this purpose. You must also create a user called 'taxonomist' who will own the created tables. If you intend to use the database with our Use Case Collector (UCC) tool later on, you should execute the above schema dump with a Postgres user who has root privileges. The root privileges are needed to install the Postgres trigram extension (pg_trgm).

Step 1a): Installing from PyPI

Simply execute

python3 -m pip install reproduce_wem_taxonomy

to install the needed packages.

Step 1b): Installing from source

First, clone this repository. Then, from within the repository root directory, pull in the pubfisher submodule:

git submodule update --remote lib/pubfisher

Now, install this module from source using pip:

python3 -m pip install -e lib/pubfisher

After that, you can install the reproduce_wem_taxonomy package as well:

python3 -m pip install -e .

Step 2: Collecting the publications from Google Scholar

In order to finally collect the publications, simply execute the module fish_wem_taxonomy:

python3 -m reproduce_wem_taxonomy.collect_relevant_publications

The publications are now stored in the database table 'publications' of your previously created Postgres database.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for reproduce-wem-taxonomy, version 2020.1.29
Filename, size File type Python version Upload date Hashes
Filename, size reproduce_wem_taxonomy-2020.1.29-py3-none-any.whl (10.8 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size reproduce_wem_taxonomy-2020.1.29.tar.gz (5.8 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page