Skip to main content

Reproduce the database of use cases for WEMs.

Project description

Towards a Taxonomy of Word Embedding Models: The Database

This repository contains the code that was used to collect relevant publications and store them in a Postgres Database. You are welcome to reproduce our database! Note however that the search results on Google Scholar may have been updated since our publication, leading to slightly different results.

As a prerequisite, you must have set up a Postgres database with the proper database schema. To do this, you can execute our Postgres schema dump provided in data/wem_taxonomy_schema.sql with the psql shell:

psql dbname < data/wem_taxonomy_schema.sql

where dbname is the name of an empty database that you have already created for this purpose. You must also create a user called 'taxonomist' who will own the created tables. If you intend to use the database with our Use Case Collector (UCC) tool later on, you should execute the above schema dump with a Postgres user who has root privileges. The root privileges are needed to install the Postgres trigram extension (pg_trgm).

Step 1a): Installing from PyPI

Simply execute

python3 -m pip install reproduce_wem_taxonomy

to install the needed packages.

Step 1b): Installing from source

First, clone this repository. Then, from within the repository root directory, pull in the pubfisher submodule:

git submodule update --remote lib/pubfisher

Now, install this module from source using pip:

python3 -m pip install -e lib/pubfisher

After that, you can install the reproduce_wem_taxonomy package as well:

python3 -m pip install -e .

Step 2: Collecting the publications from Google Scholar

In order to finally collect the publications, simply execute the module fish_wem_taxonomy:

python3 -m reproduce_wem_taxonomy.collect_relevant_publications

The publications are now stored in the database table 'publications' of your previously created Postgres database.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reproduce_wem_taxonomy-2020.1.29.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

reproduce_wem_taxonomy-2020.1.29-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file reproduce_wem_taxonomy-2020.1.29.tar.gz.

File metadata

  • Download URL: reproduce_wem_taxonomy-2020.1.29.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.6

File hashes

Hashes for reproduce_wem_taxonomy-2020.1.29.tar.gz
Algorithm Hash digest
SHA256 9393642cc090be670d52745cd490a905fce7bb8f2f529a9e1719b30964391de4
MD5 e43dde50699cd686a27067d8a611b113
BLAKE2b-256 a6750db24e073a2c2fc43caefec0fb6d549ef7168732c47f62bcb6ea82c3dc7a

See more details on using hashes here.

File details

Details for the file reproduce_wem_taxonomy-2020.1.29-py3-none-any.whl.

File metadata

  • Download URL: reproduce_wem_taxonomy-2020.1.29-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.6

File hashes

Hashes for reproduce_wem_taxonomy-2020.1.29-py3-none-any.whl
Algorithm Hash digest
SHA256 6a30ecaf2caa955f765df894da5696a728cf297a6e0dc7ab5e9b48e95e0e5b38
MD5 df29c3c66f971ca111aef5b88298457e
BLAKE2b-256 ce212989c0626a141350ffd9a0232a0aab7ddc427a7233fa95048f97445b0288

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page