A tool for learning vector representations of words and entities from Wikipedia

These details have not been verified by PyPI

Project links

Homepage

Project description

Wikipedia2Vec

Wikipedia2Vec is a tool used for obtaining embeddings (or vector representations) of words and entities (i.e., concepts that have corresponding pages in Wikipedia) from Wikipedia. It is developed and maintained by Studio Ousia.

This tool enables you to learn embeddings of words and entities simultaneously, and places similar words and entities close to one another in a continuous vector space. Embeddings can be easily trained by a single command with a publicly available Wikipedia dump as input.

This tool implements the conventional skip-gram model to learn the embeddings of words, and its extension proposed in Yamada et al. (2016) to learn the embeddings of entities.

An empirical comparison between Wikipedia2Vec and existing embedding tools (i.e., FastText, Gensim, RDF2Vec, and Wiki2vec) is available here.

Documentation are available online at http://wikipedia2vec.github.io/.

Basic Usage

Wikipedia2Vec can be installed via PyPI:

% pip install wikipedia2vec

With this tool, embeddings can be learned by running a train command with a Wikipedia dump as input. For example, the following commands download the latest English Wikipedia dump and learn embeddings from this dump:

% wget https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
% wikipedia2vec train enwiki-latest-pages-articles.xml.bz2 MODEL_FILE

Then, the learned embeddings are written to MODEL_FILE. Note that this command can take many optional parameters. Please refer to our documentation for further details.

Pretrained Embeddings

Pretrained embeddings for 12 languages (i.e., English, Arabic, Chinese, Dutch, French, German, Italian, Japanese, Polish, Portuguese, Russian, and Spanish) can be downloaded from this page.

Use Cases

Wikipedia2Vec has been applied to the following tasks:

Entity linking: Yamada et al., 2016, Eshel et al., 2017, Chen et al., 2019, Poerner et al., 2020, van Hulst et al., 2020.
Named entity recognition: Sato et al., 2017, Lara-Clares and Garcia-Serrano, 2019.
Question answering: Yamada et al., 2017, Poerner et al., 2020.
Entity typing: Yamada et al., 2018.
Text classification: Yamada et al., 2018, Yamada and Shindo, 2019, Alam et al., 2020.
Relation classification: Poerner et al., 2020.
Paraphrase detection: Duong et al., 2018.
Knowledge graph completion: Shah et al., 2019, Shah et al., 2020.
Fake news detection: Singh et al., 2019, Ghosal et al., 2020.
Plot analysis of movies: Papalampidi et al., 2019.
Novel entity discovery: Zhang et al., 2020.
Entity retrieval: Gerritse et al., 2020.
Deepfake detection: Zhong et al., 2020.
Conversational information seeking: Rodriguez et al., 2020.
Query expansion: Rosin et al., 2020.

References

If you use Wikipedia2Vec in a scientific publication, please cite the following paper:

Ikuya Yamada, Akari Asai, Jin Sakuma, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji, Yuji Matsumoto, Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia.

@inproceedings{yamada2020wikipedia2vec,
  title = "{W}ikipedia2{V}ec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from {W}ikipedia",
  author={Yamada, Ikuya and Asai, Akari and Sakuma, Jin and Shindo, Hiroyuki and Takeda, Hideaki and Takefuji, Yoshiyasu and Matsumoto, Yuji},
  booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
  year = {2020},
  publisher = {Association for Computational Linguistics},
  pages = {23--30}
}

The embedding model was originally proposed in the following paper:

Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji, Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation.

@inproceedings{yamada2016joint,
  title={Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation},
  author={Yamada, Ikuya and Shindo, Hiroyuki and Takeda, Hideaki and Takefuji, Yoshiyasu},
  booktitle={Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning},
  year={2016},
  publisher={Association for Computational Linguistics},
  pages={250--259}
}

The text classification model implemented in this example was proposed in the following paper:

Ikuya Yamada, Hiroyuki Shindo, Neural Attentive Bag-of-Entities Model for Text Classification.

@article{yamada2019neural,
  title={Neural Attentive Bag-of-Entities Model for Text Classification},
  author={Yamada, Ikuya and Shindo, Hiroyuki},
  booktitle={Proceedings of The 23th SIGNLL Conference on Computational Natural Language Learning},
  year={2019},
  publisher={Association for Computational Linguistics},
  pages = {563--573}
}

License

Apache License 2.0

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

2.0.0

Jan 11, 2024

2.0.0b1 pre-release

Jan 11, 2024

1.0.5

Apr 3, 2021

1.0.4

Sep 7, 2019

1.0.3

Mar 9, 2019

1.0.2

Feb 14, 2019

1.0.1

Dec 18, 2018

1.0.0

Nov 27, 2018

0.2.8

Nov 22, 2018

0.2.7

Oct 19, 2018

0.2.6

Oct 12, 2018

0.2.5

Sep 1, 2018

0.2.4

May 25, 2018

0.2.3

May 14, 2018

0.2.2

May 10, 2018

0.2.1

May 8, 2018

0.2

May 7, 2018

0.1.15

May 3, 2018

0.1.14

Apr 23, 2018

0.1.13

Apr 21, 2018

0.1.12

Apr 15, 2018

0.1.11

Apr 10, 2018

0.1.10

Apr 9, 2018

0.1.9

Mar 30, 2018

0.1.8

Mar 22, 2018

0.1.7

Mar 20, 2018

0.1.6

Mar 18, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikipedia2vec-2.0.0.tar.gz (970.0 kB view details)

Uploaded Jan 11, 2024 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wikipedia2vec-2.0.0-cp312-cp312-win_amd64.whl (1.5 MB view details)

Uploaded Jan 11, 2024 CPython 3.12Windows x86-64

wikipedia2vec-2.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.1 MB view details)

Uploaded Jan 11, 2024 CPython 3.12manylinux: glibc 2.17+ x86-64

wikipedia2vec-2.0.0-cp312-cp312-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded Jan 11, 2024 CPython 3.12macOS 11.0+ ARM64

wikipedia2vec-2.0.0-cp312-cp312-macosx_10_9_x86_64.whl (1.7 MB view details)

Uploaded Jan 11, 2024 CPython 3.12macOS 10.9+ x86-64

wikipedia2vec-2.0.0-cp311-cp311-win_amd64.whl (1.5 MB view details)

Uploaded Jan 11, 2024 CPython 3.11Windows x86-64

wikipedia2vec-2.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.1 MB view details)

Uploaded Jan 11, 2024 CPython 3.11manylinux: glibc 2.17+ x86-64

wikipedia2vec-2.0.0-cp311-cp311-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded Jan 11, 2024 CPython 3.11macOS 11.0+ ARM64

wikipedia2vec-2.0.0-cp311-cp311-macosx_10_9_x86_64.whl (1.7 MB view details)

Uploaded Jan 11, 2024 CPython 3.11macOS 10.9+ x86-64

wikipedia2vec-2.0.0-cp310-cp310-win_amd64.whl (1.5 MB view details)

Uploaded Jan 11, 2024 CPython 3.10Windows x86-64

wikipedia2vec-2.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.7 MB view details)

Uploaded Jan 11, 2024 CPython 3.10manylinux: glibc 2.17+ x86-64

wikipedia2vec-2.0.0-cp310-cp310-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded Jan 11, 2024 CPython 3.10macOS 11.0+ ARM64

wikipedia2vec-2.0.0-cp310-cp310-macosx_10_9_x86_64.whl (1.7 MB view details)

Uploaded Jan 11, 2024 CPython 3.10macOS 10.9+ x86-64

wikipedia2vec-2.0.0-cp39-cp39-win_amd64.whl (1.5 MB view details)

Uploaded Jan 11, 2024 CPython 3.9Windows x86-64

wikipedia2vec-2.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB view details)

Uploaded Jan 11, 2024 CPython 3.9manylinux: glibc 2.17+ x86-64

wikipedia2vec-2.0.0-cp39-cp39-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded Jan 11, 2024 CPython 3.9macOS 11.0+ ARM64

wikipedia2vec-2.0.0-cp39-cp39-macosx_10_9_x86_64.whl (1.7 MB view details)

Uploaded Jan 11, 2024 CPython 3.9macOS 10.9+ x86-64

wikipedia2vec-2.0.0-cp38-cp38-win_amd64.whl (1.6 MB view details)

Uploaded Jan 11, 2024 CPython 3.8Windows x86-64

wikipedia2vec-2.0.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.9 MB view details)

Uploaded Jan 11, 2024 CPython 3.8manylinux: glibc 2.17+ x86-64

wikipedia2vec-2.0.0-cp38-cp38-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded Jan 11, 2024 CPython 3.8macOS 11.0+ ARM64

wikipedia2vec-2.0.0-cp38-cp38-macosx_10_9_x86_64.whl (1.7 MB view details)

Uploaded Jan 11, 2024 CPython 3.8macOS 10.9+ x86-64

File details

Details for the file wikipedia2vec-2.0.0.tar.gz.

File metadata

Download URL: wikipedia2vec-2.0.0.tar.gz
Upload date: Jan 11, 2024
Size: 970.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for wikipedia2vec-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`191b9a80fb16653315385fc5ff4b26c92bf02dca5ffcd038d184ed6e61b6c350`
MD5	`780d2c582120aa557b0d81c977b951d2`
BLAKE2b-256	`cb1c07887bf23c3fa3dfc01bf0b6e3c02048a02f5d1c29e75b5569f1bd92e78a`

Algorithm	Hash digest
SHA256	`ddcc94fe9fa1132106f5c08da417e724400f38ace5f8d657352e6fd2831ab580`
MD5	`41a91e24bde4e4d01c82b24e7af5e96f`
BLAKE2b-256	`205eda15e9166f44c452fb1f3142b87cd20a9dc3c51d1c38482c44a1eb3454a6`

Algorithm	Hash digest
SHA256	`4cd6b33d84fc7faaec117e1312869b07906ab0fd03de82bf51e93aa81f5efa5b`
MD5	`6c3a258ef1d37e7f83a1b32767a6bd1e`
BLAKE2b-256	`4e7f71ceffdfb1e26a1302d56ee9e4568022df70aed93b7cabdd87157b7971eb`

Algorithm	Hash digest
SHA256	`907a7912c0a982fe96c97918b46f3a22562443b236b666c03f88be2a1e7c6900`
MD5	`7e4a2636f1f2e00ee476a427451c3e54`
BLAKE2b-256	`01c2d3621f0ca49d89e79b67855431ac722678389df55c6e0b4b03bf129a6a21`

Algorithm	Hash digest
SHA256	`ff717fff738c4aab92c640f443515505816fcf4767ecb343a8e61fc1aaca66c2`
MD5	`b084a1d031ebb9f935070307e4d50a9a`
BLAKE2b-256	`e6ad7b9d54c2ed5647b7b15e0781100a582ee6a6f3c8a04223d27550bd0c35bb`

Algorithm	Hash digest
SHA256	`e17b1db1b858455753febb05334b65c2676dbcea1f4e66e7b1adaef5c4d52e6e`
MD5	`e937b5adabaccfb94e624d8c094cb7b1`
BLAKE2b-256	`9f7719d09f5b543ed3cacf0d7ded0cafdee61baf0f610ff5b3c296043cba86c0`

Algorithm	Hash digest
SHA256	`f554cefde337725866de9888bc567e8a1bc2b024f0b1f025ce9d8273bd229cff`
MD5	`e28a5e2f700c20129bf423e36eaaef99`
BLAKE2b-256	`8f4341c122f4b2a67d94fa154da796c8f9d41dc189e398637210e56802587d01`

Algorithm	Hash digest
SHA256	`dc327989c3ac31b864468cb3882e2ddc6abb850efb2ae5605a316111182b144d`
MD5	`e3f1aedf94b356348d38a3207585cfe7`
BLAKE2b-256	`5a7ac40eebd9dad0230a16539cb7fd84779c19d39d207473c412b408074e3fcd`

Algorithm	Hash digest
SHA256	`3ec0da92623093e0ccb59249cb70fdbed9d679462135a618bff7e39f235d2fdb`
MD5	`dfcfe83f2d9e76b8c4e2175e653600a8`
BLAKE2b-256	`f967b011227824a7210760eaab8e4dc16bd8aeb5797db29ab5cb27052b8a7204`

Algorithm	Hash digest
SHA256	`eda8c27a31538bf471a6ef44e9e351f5b4eba3852fb1dff49b524f0fdbd678c2`
MD5	`cff6f982a0a29bb3c8a43204b299e904`
BLAKE2b-256	`386ffc4cb6472d569b0349644d0776d5b638b2a9fa39e24ae6281932d51338bc`

Algorithm	Hash digest
SHA256	`1cb267258ec8ea59e68c0625a613700eea4a4a99ea77948f654005eca72d9441`
MD5	`7bfbf15d305d0b6dc67fe62d89a9a482`
BLAKE2b-256	`4f95583ff157469e665b892126d6168a6df7442c987a22049036910415211d67`

Algorithm	Hash digest
SHA256	`a2b70318e8e9ed7bf44fb321b272692de5e875314234d19e51d348133c74a97e`
MD5	`afc729cfb3001aaff067f309de48e747`
BLAKE2b-256	`fd0a04cdd6eb5c088fb4ceb5a2c0251f7ee659ed5bb2a8cb2ab38408f13c6e89`

Algorithm	Hash digest
SHA256	`df3c47db08266fbc73112f7600ebd8ac4be6d2e474d632967591a666a5d3102a`
MD5	`e898dd4bf26f4e7ae77a60fcb29cf8ed`
BLAKE2b-256	`1d4ff1b94048afd56d643e2b684162d8f9e6ebeedbad964bea720578f656cbdd`

Algorithm	Hash digest
SHA256	`0ea17fd852158765c6328156f86d87507f9fac52b6a9121a99834d5c7aac13e1`
MD5	`30c7db1e7542744770455452d774d4c0`
BLAKE2b-256	`7d45e34d41f420d78c15d8d0c7ca8520664fc42aa229451c86a1ea1806fe1edb`

Algorithm	Hash digest
SHA256	`0e874978e0333788b0a92cb2039eabd629d01f1a912fd9e2e8872adc7ee8d22c`
MD5	`fda70e0afc10d6f6747a6ae3fdbfd522`
BLAKE2b-256	`f6a38124139e51b441682a4182173af7e77e739b60a30de709cffa34e9fdb5ab`

Algorithm	Hash digest
SHA256	`8f0658b2704c4c0b3c107d00b9e65b136e16bc27a835d10e5a90c69efe594a32`
MD5	`dc1277301e149e5bcad59d74ce253987`
BLAKE2b-256	`a16349fb110fc83ae8f5127ae5dbd2f19268a34aaa1f9458a77d20b964fedaac`

Algorithm	Hash digest
SHA256	`3f75bbfa99692215545f952dc71c8ede42c23def895d10d4b0cd3b686e74d6a7`
MD5	`ca3e88efca2e863e2784feb6db0b3753`
BLAKE2b-256	`055205757adfa27cbe1d9b44009be6cff58bf0877de1a0b700e7eb35594cd26d`

Algorithm	Hash digest
SHA256	`1cfbfb90478ec515b8635bb286ea01b9c64a0c1a50ef7e251ed15aa2e7fa56ee`
MD5	`ad8ae2e364e6be692300524c44cac401`
BLAKE2b-256	`6f229258a3b5cca042e461c07c2d94fc7c5f97a54557b1a2c0d367651d9824f3`

Algorithm	Hash digest
SHA256	`747dcaeaba220f1b246881d2d6edbb4da5e344cba8733208ee37e72fe2dad2cb`
MD5	`40464083feae9e6d4d9da71fe3c429ad`
BLAKE2b-256	`f13c6972eb074133c6f2ef0810d7b56fafbd45c4ce0511b03048fb612c4d7e1f`

Algorithm	Hash digest
SHA256	`23a3f8b080693683024ccbcf3534518fd8ba5e7d1a63241a9eecee2132afb8e3`
MD5	`c120886835c51d1a8ed711bcdede6f99`
BLAKE2b-256	`bf7e057d7e0ef6a34cc1cb1242a4d146071ffa5808c118b8636cce8538e08c12`

Algorithm	Hash digest
SHA256	`7bbbb8c69cd98a2fbc15d44bffc2c9b16228b060b4ed550fa960973e0b8c01fd`
MD5	`3cbe4cfbcffb83dcdf55cd8f34fb0e52`
BLAKE2b-256	`2661ed685e7c23ca454da813d380489e0bdb519606b0eb709f6c7116bec1616f`

wikipedia2vec 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Wikipedia2Vec

Basic Usage

Pretrained Embeddings

Use Cases

References

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes