Skip to main content

A tool for learning vector representations of words and entities from Wikipedia

Project description

Wikipedia2Vec

Fury badge CircleCI

Wikipedia2Vec is a tool used for obtaining embeddings (vector representations) of words and entities from Wikipedia. It is developed and maintained by Studio Ousia.

This tool enables you to learn embeddings of words and entities simultaneously, and places similar words and entities close to one another in a continuous vector space. Embeddings can be easily trained by a single command with a publicly available Wikipedia dump as input. This tool has been used in several state-of-the-art NLP models such as entity linking, named entity recognition, entity relatedness, and question answering.

Documentation and pretrained embeddings are available online at http://wikipedia2vec.github.io/.

Reference

If you use Wikipedia2Vec in a scientific publication, please cite the following paper:

Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji, Wikipedia2Vec: An Optimized Implementation for Learning Embeddings from Wikipedia.

@article{yamada2018wikipedia2vec,
  title={Wikipedia2Vec: An Optimized Implementation for Learning Embeddings from Wikipedia},
  author={Yamada, Ikuya and Asai, Akari and Shindo, Hiroyuki and Takeda, Hideaki and Takefuji, Yoshiyasu},
  journal={arXiv preprint 1812.06280},
  year={2018}
}

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikipedia2vec-1.0.1.tar.gz (1.2 MB view details)

Uploaded Source

File details

Details for the file wikipedia2vec-1.0.1.tar.gz.

File metadata

  • Download URL: wikipedia2vec-1.0.1.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.19.8 CPython/3.6.0

File hashes

Hashes for wikipedia2vec-1.0.1.tar.gz
Algorithm Hash digest
SHA256 b3ac713b6d5fc095d22b455b8ffbae0039d11e219836588f150782e2b8d60b04
MD5 22cff92e64bf5740f6713ceade25c9a5
BLAKE2b-256 cc8c17fdfd04a15dac72ca57a4a35ab29d82fb68fdf3a2dd4f2c36031b8d05cd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page